Creating a Dictionary

You can think of a dictionary as being one of:

a list of words
a sorted list of words
a set of words
a sorted set of words

Happily, List, Set, and SortedSet are all sub-interfaces of Collection and Collection has a method named contains that returns true if the collection contains a specified element. Implementing a dictionary is as simple as creating a Collection.

Creating a Dictionary

You can find open source lists of English words fairly easily on the Internet (here). I have concatenated several of the lists together to form a dictionary of common English words that you can download here. The dictionary file has one word per line.

create a collection for the dictionary
open the dictionary file
for each line of the file
   read the word
   add the word to the dictionary

import java.util.Collection;
import java.util.ArrayList;
import java.util.Scanner;
import java.io.*;

public class Dictionary
{
   public static void main(String[] args) throws IOException
   {
      PrintStream output = System.out;
      Scanner input = new Scanner(System.in);
      
      Collection<String> dictionary = new ArrayList<String>();
      
      final String DICT_NAME = "dictionary.txt";
      Scanner dictionaryInput = new Scanner(new File(DICT_NAME));
      
      for (; dictionaryInput.hasNextLine();)
      {
         String word = dictionaryInput.nextLine().trim();
         dictionary.add(word);
      }
      
      output.printf("Dictionary has %d words%n", dictionary.size());
   }
}

`ArrayList`

If you need a list, ArrayList should generally be your first choice.

Recall that lists support indexed access and insertion of elements. There are versions of get, add, and remove that let the client specify an integer index into the list to perform an operation.

Looking up Words in the Dictionary

Looking up a word in the dictionary is as easy as invoking the contains method from the Collection interface:

prompt the user for a word
while there is still input
   read the word
   if the word is in the dictionary
      output "in the dictionary"
   else
      output "not in the dictionary"
   prompt the user for a word

      output.print("Enter a word to look up: ");
      for (; input.hasNextLine();)
      {
         String word = input.nextLine().trim().toLowerCase();
         if (dictionary.contains(word))
         {
            output.printf("%s is in the dictionary%n%n", word.toUpperCase());
         }
         else
         {
            output.printf("%s is not in the dictionary%n%n", word.toUpperCase());
         }
         output.print("Enter a word to look up: ");
      }

Efficiency of `contains`

As a client, you should not care how a method is implemented. However, you might speculate that contains must search the entire collection to determine if an element is in the collection. In fact, the different kinds of collections implement contains with varying degrees of efficiency.

A modern pc is sufficiently fast that it will be difficult to compare computational efficiency by looking up a single word; instead, we will choose N words at random. This is easy to do if the dictionary is a List because:

List supports indexed access
we know how many words are in the dictionary

By randomly generating a number between 0 and the number of words in the dictionary minus 1 we can select a random word to look up.

`Random`

An instance of the Random class generates random numbers. Of interest to us is the method nextInt:

int nextInt(int n)

Returns a pseudorandom, uniformly distributed int value between 0 (inclusive) and the specified value (exclusive).

Based on the contract of nextInt, we can pick a random word from the dictionary like so:

      Random rng = new Random();
      int randomIndex = rng.nextInt(dictionary.size());
      String randomWord = dictionary.get(randomIndex);

Measuring Elapsed Time

You can use the System utility to measure elapsed time. Of interest to us is the method nanoTime:

static long nanoTime()

Returns the current value of the most precise available system timer, in nanoseconds..

To measure elapsed time, you invoke nanoTime twice: once to get the starting time and once to get the ending time. The difference between the two times is the elapsed time.

      long startTime = System.nanoTime();
      /*
         YOUR CODE YOU WANT TO TIME
      */
      long endTime = System.nanoTime();
      long elapsedTime = endTime - startTime;

Measuring Efficiency

We will ask the user for the number of words to look up. Because the words are chosen randomly, we will get a different measure of time elapsed everytime we run the program. Instead, we will compute an average time by asking the user for the number of times to run the test.

% java DictionaryArrayList
Number of words to look up? 8000
Number of tests to run? 1000
Average time : 1297536.78

%

      // Number of words and test
      output.print("Number of words to look up? ");
      String num = input.nextLine().trim();
      int numWords = Integer.parseInt(num);
      
      output.print("Number of tests to run? ");
      num = input.nextLine().trim();
      int numTests = Integer.parseInt(num);
      
      // Average time
      long sumTime = 0;
      
      // Perform numTests
      for (int test = 0; test < numTests; test++)
      {
         long startTime = System.nanoTime();
         
         // Look up numWords random words
         for (int i = 0; i < numWords; i++)
         {
            int index = rng.nextInt(WORDS.size());
            String word = WORDS.get(index);
            boolean isInDictionary = dictionary.contains(word);
         }
         long endTime = System.nanoTime();
         sumTime += endTime - startTime;
      }
      
      output.println("Average time : " + ((double) sumTime / numTests));

import java.util.*;
import java.io.*;

public class DictionaryArrayList
{
   public static void main(String[] args) throws IOException
   {
      PrintStream output = System.out;
      Scanner input = new Scanner(System.in);
      
      Collection<String> dictionary = new ArrayList<String>();
      
      final String DICT_NAME = "dictionary.txt";
      Scanner dictionaryInput = new Scanner(new File(DICT_NAME));
      
      for (; dictionaryInput.hasNextLine();)
      {
         String word = dictionaryInput.nextLine().trim();
         dictionary.add(word);
      }
      output.println(dictionary.size());
      
      // ArrayList version of the dictionary to pick random words from
      final ArrayList<String> WORDS = new ArrayList<String>(dictionary);
            
      // random number generator to randomly pick words
      Random rng = new Random();
      
      // Number of words and test
      output.print("Number of words to look up? ");
      String num = input.next().trim();
      int numWords = Integer.parseInt(num);
      
      output.print("Number of tests to run? ");
      num = input.next().trim();
      int numTests = Integer.parseInt(num);
      
      // Average time
      long sumTime = 0;
      
      // Perform numTests
      for (int test = 0; test < numTests; test++)
      {
         long startTime = System.nanoTime();
         
         // Look up numWords random words
         for (int i = 0; i < numWords; i++)
         {
            int index = rng.nextInt(WORDS.size());
            String word = WORDS.get(index);
            boolean isInDictionary = dictionary.contains(word);
         }
         long endTime = System.nanoTime();
         sumTime += endTime - startTime;
      }
      
      output.printf("Average time : %,.0f", ((sumTime + 0.0) / numTests));
   }
}

Results for `ArrayList`

Number of words searched for	Average time (ns)
100	32,176,990
200	62,919,539
400	125,036,188
800	258,447,301
1600	495,846,249

Other Types of Collections

By changing one line of code, we can test any collection that implements the Collection interface. How is this possible? Because the dictionary is declared as a Collection and anything that implements the Collection interface is substitutable for a Collection.

      Collection<String> dictionary = new ArrayList<String>();

      Collection<String> dictionary = new LinkedList<String>();

      Collection<String> dictionary = new TreeSet<String>();

      Collection<String> dictionary = new HashSet<String>();

One Program to Test them All

We can write a single program to test all of the collections we are interested in; the program asks the user for a choice of collection and we create a collection of appropriate type:

      Collection<String> dictionary = null;
      
      // Type of collection
      output.println("Choose a collection");
      output.println("1. ArrayList");
      output.println("2. LinkedList");
      output.println("3. TreeSet");
      output.println("4. HashSet");
      
      int collectionType = input.nextInt();
      if (collectionType == 1)
      {
         dictionary = new ArrayList<String>();
      }
      else if (collectionType == 2)
      {
         dictionary = new LinkedList<String>();
      }
      else if (collectionType == 3)
      {
         dictionary = new TreeSet<String>();
      }
      else
      {
         dictionary = new HashSet<String>();
      }

import java.util.*;
import java.io.*;

public class DictionaryAny
{
   public static void main(String[] args) throws IOException
   {
      PrintStream output = System.out;
      Scanner input = new Scanner(System.in);
      
      // Type of collection
      Collection<String> dictionary = null;
      output.println("Choose a collection");
      output.println("1. ArrayList");
      output.println("2. LinkedList");
      output.println("3. TreeSet");
      output.println("4. HashSet");
      int collectionType = input.nextInt();
      if (collectionType == 1)
      {
         output.println("array list");
         dictionary = new ArrayList<String>();
      }
      else if (collectionType == 2)
      {
         dictionary = new LinkedList<String>();
      }
      else if (collectionType == 3)
      {
         dictionary = new TreeSet<String>();
      }
      else
      {
         dictionary = new HashSet<String>();
      }
      
      final String DICT_NAME = "dictionary.txt";
      Scanner dictionaryInput = new Scanner(new File(DICT_NAME));
      
      for (; dictionaryInput.hasNextLine();)
      {
         String word = dictionaryInput.nextLine().trim();
         dictionary.add(word);
      }
      output.println(dictionary.size());
      
      // ArrayList version of the dictionary to pick random words from
      final ArrayList<String> WORDS = new ArrayList<String>(dictionary);
            
      // random number generator to randomly pick words
      Random rng = new Random();
      
      // Number of words and test
      output.print("Number of words to look up? ");
      String num = input.next().trim();
      int numWords = Integer.parseInt(num);
      
      output.print("Number of tests to run? ");
      num = input.next().trim();
      int numTests = Integer.parseInt(num);
      
      // Average time
      long sumTime = 0;
      
      for (int test = 0; test < numTests; test++)
      {
         long startTime = System.nanoTime();
         for (int i = 0; i < numWords; i++)
         {
            int index = rng.nextInt(WORDS.size());
            String word = WORDS.get(index);
            boolean isInDictionary = dictionary.contains(word);
         }
         long endTime = System.nanoTime();
         sumTime += endTime - startTime;
      }
      
      output.printf("Average time : %,.0f", ((double) sumTime / numTests));
   }
}

Results for Several Collections

Number of words searched for	Relative average time
Number of words searched for	ArrayList	LinkedList	TreeSet	HashSet
100	1,817	2,891	6	1
200	3,554	5,803	12	2
400	7,062	11,344	26	4
800	14,597	22,209	49	8
1,600	28,004	43,998	96	15

searching a Set is probably more efficient than searching an ordinary List
searching a HashSet is probably more efficient than searching a TreeSet

But what about other operations?

To Do For Next Lecture

Continue reading Chapter 10.

Write a program that creates a TreeSet of strings and a HashSet of strings. Insert the strings "A", "B", "C", ..., "Z" into both sets and then print both sets (you can use output.println). What do you notice about the order of the elements in both sets?

Write a program that creates an ArrayList of Integer and a LinkedList of Integer. Add the numbers 1, 2, 3, ..., 100000 to the front of both lists and time how long it takes for each type of list. Which is (much) faster? What happens when you add to the end of the lists?

Day 29 — Word Games

Word Games

Creating a Dictionary

Creating a Dictionary

`ArrayList`

Looking up Words in the Dictionary

Efficiency of `contains`

`Random`

Measuring Elapsed Time

Measuring Efficiency

Results for `ArrayList`

Other Types of Collections

One Program to Test them All

Results for Several Collections

To Do For Next Lecture

Day 29 — Word Games

Word Games

Creating a Dictionary

Creating a Dictionary

ArrayList

Looking up Words in the Dictionary

Efficiency of contains

Random

Measuring Elapsed Time

Measuring Efficiency

Results for ArrayList

Other Types of Collections

One Program to Test them All

Results for Several Collections

To Do For Next Lecture

`ArrayList`

Efficiency of `contains`

`Random`

Results for `ArrayList`