slanted W3C logo

Day 29 — Word Games

Word Games

From Day 01:

Word games seem to be very popular...

The fundamental component of a word game is a dictionary. You need a dictionary to validate words entered by the user.

Creating a Dictionary

You can think of a dictionary as being one of:

Happily, List, Set, and SortedSet are all sub-interfaces of Collection and Collection has a method named contains that returns true if the collection contains a specified element. Implementing a dictionary is as simple as creating a Collection.

Creating a Dictionary

You can find open source lists of English words fairly easily on the Internet (here). I have concatenated several of the lists together to form a dictionary of common English words that you can download here. The dictionary file has one word per line.

create a collection for the dictionary
open the dictionary file
for each line of the file
   read the word
   add the word to the dictionary
import java.util.Collection;
import java.util.ArrayList;
import java.util.Scanner;
import java.io.*;

public class Dictionary
{
   public static void main(String[] args) throws IOException
   {
      PrintStream output = System.out;
      Scanner input = new Scanner(System.in);
      
      Collection<String> dictionary = new ArrayList<String>();
      
      final String DICT_NAME = "dictionary.txt";
      Scanner dictionaryInput = new Scanner(new File(DICT_NAME));
      
      for (; dictionaryInput.hasNextLine();)
      {
         String word = dictionaryInput.nextLine().trim();
         dictionary.add(word);
      }
      
      output.printf("Dictionary has %d words%n", dictionary.size());
   }
}

ArrayList

If you need a list, ArrayList should generally be your first choice.

Recall that lists support indexed access and insertion of elements. There are versions of get, add, and remove that let the client specify an integer index into the list to perform an operation.

Looking up Words in the Dictionary

Looking up a word in the dictionary is as easy as invoking the contains method from the Collection interface:

prompt the user for a word
while there is still input
   read the word
   if the word is in the dictionary
      output "in the dictionary"
   else
      output "not in the dictionary"
   prompt the user for a word
      output.print("Enter a word to look up: ");
      for (; input.hasNextLine();)
      {
         String word = input.nextLine().trim().toLowerCase();
         if (dictionary.contains(word))
         {
            output.printf("%s is in the dictionary%n%n", word.toUpperCase());
         }
         else
         {
            output.printf("%s is not in the dictionary%n%n", word.toUpperCase());
         }
         output.print("Enter a word to look up: ");
      }

Efficiency of contains

As a client, you should not care how a method is implemented. However, you might speculate that contains must search the entire collection to determine if an element is in the collection. In fact, the different kinds of collections implement contains with varying degrees of efficiency.

A modern pc is sufficiently fast that it will be difficult to compare computational efficiency by looking up a single word; instead, we will choose N words at random. This is easy to do if the dictionary is a List because:

  1. List supports indexed access
  2. we know how many words are in the dictionary

By randomly generating a number between 0 and the number of words in the dictionary minus 1 we can select a random word to look up.

Random

An instance of the Random class generates random numbers. Of interest to us is the method nextInt:

int nextInt(int n)

Returns a pseudorandom, uniformly distributed int value between 0 (inclusive) and the specified value (exclusive).

Based on the contract of nextInt, we can pick a random word from the dictionary like so:

      Random rng = new Random();
      int randomIndex = rng.nextInt(dictionary.size());
      String randomWord = dictionary.get(randomIndex);

Measuring Elapsed Time

You can use the System utility to measure elapsed time. Of interest to us is the method nanoTime:

static long nanoTime()

Returns the current value of the most precise available system timer, in nanoseconds..

To measure elapsed time, you invoke nanoTime twice: once to get the starting time and once to get the ending time. The difference between the two times is the elapsed time.

      long startTime = System.nanoTime();
      /*
         YOUR CODE YOU WANT TO TIME
      */
      long endTime = System.nanoTime();
      long elapsedTime = endTime - startTime;

Measuring Efficiency

We will ask the user for the number of words to look up. Because the words are chosen randomly, we will get a different measure of time elapsed everytime we run the program. Instead, we will compute an average time by asking the user for the number of times to run the test.

% java DictionaryArrayList
Number of words to look up? 8000
Number of tests to run? 1000
Average time : 1297536.78

%
      // Number of words and test
      output.print("Number of words to look up? ");
      String num = input.nextLine().trim();
      int numWords = Integer.parseInt(num);
      
      output.print("Number of tests to run? ");
      num = input.nextLine().trim();
      int numTests = Integer.parseInt(num);
      
      // Average time
      long sumTime = 0;
      
      // Perform numTests
      for (int test = 0; test < numTests; test++)
      {
         long startTime = System.nanoTime();
         
         // Look up numWords random words
         for (int i = 0; i < numWords; i++)
         {
            int index = rng.nextInt(WORDS.size());
            String word = WORDS.get(index);
            boolean isInDictionary = dictionary.contains(word);
         }
         long endTime = System.nanoTime();
         sumTime += endTime - startTime;
      }
      
      output.println("Average time : " + ((double) sumTime / numTests));
import java.util.*;
import java.io.*;

public class DictionaryArrayList
{
   public static void main(String[] args) throws IOException
   {
      PrintStream output = System.out;
      Scanner input = new Scanner(System.in);
      
      Collection<String> dictionary = new ArrayList<String>();
      
      final String DICT_NAME = "dictionary.txt";
      Scanner dictionaryInput = new Scanner(new File(DICT_NAME));
      
      for (; dictionaryInput.hasNextLine();)
      {
         String word = dictionaryInput.nextLine().trim();
         dictionary.add(word);
      }
      output.println(dictionary.size());
      
      // ArrayList version of the dictionary to pick random words from
      final ArrayList<String> WORDS = new ArrayList<String>(dictionary);
            
      // random number generator to randomly pick words
      Random rng = new Random();
      
      // Number of words and test
      output.print("Number of words to look up? ");
      String num = input.next().trim();
      int numWords = Integer.parseInt(num);
      
      output.print("Number of tests to run? ");
      num = input.next().trim();
      int numTests = Integer.parseInt(num);
      
      // Average time
      long sumTime = 0;
      
      // Perform numTests
      for (int test = 0; test < numTests; test++)
      {
         long startTime = System.nanoTime();
         
         // Look up numWords random words
         for (int i = 0; i < numWords; i++)
         {
            int index = rng.nextInt(WORDS.size());
            String word = WORDS.get(index);
            boolean isInDictionary = dictionary.contains(word);
         }
         long endTime = System.nanoTime();
         sumTime += endTime - startTime;
      }
      
      output.printf("Average time : %,.0f", ((sumTime + 0.0) / numTests));
   }
}

Results for ArrayList

Number of words
searched for
Average time (ns)
100 32,176,990
200 62,919,539
400 125,036,188
800 258,447,301
1600 495,846,249

Other Types of Collections

By changing one line of code, we can test any collection that implements the Collection interface. How is this possible? Because the dictionary is declared as a Collection and anything that implements the Collection interface is substitutable for a Collection.

      Collection<String> dictionary = new ArrayList<String>();
      Collection<String> dictionary = new LinkedList<String>();
      Collection<String> dictionary = new TreeSet<String>();
      Collection<String> dictionary = new HashSet<String>();

One Program to Test them All

We can write a single program to test all of the collections we are interested in; the program asks the user for a choice of collection and we create a collection of appropriate type:

      Collection<String> dictionary = null;
      
      // Type of collection
      output.println("Choose a collection");
      output.println("1. ArrayList");
      output.println("2. LinkedList");
      output.println("3. TreeSet");
      output.println("4. HashSet");
      
      int collectionType = input.nextInt();
      if (collectionType == 1)
      {
         dictionary = new ArrayList<String>();
      }
      else if (collectionType == 2)
      {
         dictionary = new LinkedList<String>();
      }
      else if (collectionType == 3)
      {
         dictionary = new TreeSet<String>();
      }
      else
      {
         dictionary = new HashSet<String>();
      }
import java.util.*;
import java.io.*;

public class DictionaryAny
{
   public static void main(String[] args) throws IOException
   {
      PrintStream output = System.out;
      Scanner input = new Scanner(System.in);
      
      // Type of collection
      Collection<String> dictionary = null;
      output.println("Choose a collection");
      output.println("1. ArrayList");
      output.println("2. LinkedList");
      output.println("3. TreeSet");
      output.println("4. HashSet");
      int collectionType = input.nextInt();
      if (collectionType == 1)
      {
         output.println("array list");
         dictionary = new ArrayList<String>();
      }
      else if (collectionType == 2)
      {
         dictionary = new LinkedList<String>();
      }
      else if (collectionType == 3)
      {
         dictionary = new TreeSet<String>();
      }
      else
      {
         dictionary = new HashSet<String>();
      }
      
      final String DICT_NAME = "dictionary.txt";
      Scanner dictionaryInput = new Scanner(new File(DICT_NAME));
      
      for (; dictionaryInput.hasNextLine();)
      {
         String word = dictionaryInput.nextLine().trim();
         dictionary.add(word);
      }
      output.println(dictionary.size());
      
      // ArrayList version of the dictionary to pick random words from
      final ArrayList<String> WORDS = new ArrayList<String>(dictionary);
            
      // random number generator to randomly pick words
      Random rng = new Random();
      
      // Number of words and test
      output.print("Number of words to look up? ");
      String num = input.next().trim();
      int numWords = Integer.parseInt(num);
      
      output.print("Number of tests to run? ");
      num = input.next().trim();
      int numTests = Integer.parseInt(num);
      
      // Average time
      long sumTime = 0;
      
      for (int test = 0; test < numTests; test++)
      {
         long startTime = System.nanoTime();
         for (int i = 0; i < numWords; i++)
         {
            int index = rng.nextInt(WORDS.size());
            String word = WORDS.get(index);
            boolean isInDictionary = dictionary.contains(word);
         }
         long endTime = System.nanoTime();
         sumTime += endTime - startTime;
      }
      
      output.printf("Average time : %,.0f", ((double) sumTime / numTests));
   }
}

Results for Several Collections

Number of words
searched for
Relative average time
ArrayList LinkedList TreeSet HashSet
100 1,817 2,891 6 1
200 3,554 5,803 12 2
400 7,062 11,344 26 4
800 14,597 22,209 49 8
1,600 28,004 43,998 96 15

But what about other operations?

To Do For Next Lecture

Continue reading Chapter 10.

Write a program that creates a TreeSet of strings and a HashSet of strings. Insert the strings "A", "B", "C", ..., "Z" into both sets and then print both sets (you can use output.println). What do you notice about the order of the elements in both sets?

Write a program that creates an ArrayList of Integer and a LinkedList of Integer. Add the numbers 1, 2, 3, ..., 100000 to the front of both lists and time how long it takes for each type of list. Which is (much) faster? What happens when you add to the end of the lists?