From Day 01:
Word games seem to be very popular...
The fundamental component of a word game is a dictionary. You need a dictionary to validate words entered by the user.
You can think of a dictionary as being one of:
Happily, List
, Set
, and SortedSet
are all sub-interfaces of Collection
and Collection
has a method named contains
that returns true
if
the collection contains a specified element. Implementing
a dictionary is as simple as creating a Collection
.
You can find open source lists of English words fairly easily on the Internet (here). I have concatenated several of the lists together to form a dictionary of common English words that you can download here. The dictionary file has one word per line.
create a collection for the dictionary open the dictionary file for each line of the file read the word add the word to the dictionary
import java.util.Collection; import java.util.ArrayList; import java.util.Scanner; import java.io.*; public class Dictionary { public static void main(String[] args) throws IOException { PrintStream output = System.out; Scanner input = new Scanner(System.in); Collection<String> dictionary = new ArrayList<String>(); final String DICT_NAME = "dictionary.txt"; Scanner dictionaryInput = new Scanner(new File(DICT_NAME)); for (; dictionaryInput.hasNextLine();) { String word = dictionaryInput.nextLine().trim(); dictionary.add(word); } output.printf("Dictionary has %d words%n", dictionary.size()); } }
ArrayList
If you need a list,
ArrayList
should generally be your first choice.
Recall that lists support indexed access and insertion of elements.
There are versions of get
, add
, and remove
that let the client specify an integer index into the list to perform
an operation.
Looking up a word in the dictionary is as easy as invoking
the contains
method from the Collection
interface:
prompt the user for a word while there is still input read the word if the word is in the dictionary output "in the dictionary" else output "not in the dictionary" prompt the user for a word
output.print("Enter a word to look up: ");
for (; input.hasNextLine();)
{
String word = input.nextLine().trim().toLowerCase();
if (dictionary.contains(word))
{
output.printf("%s is in the dictionary%n%n", word.toUpperCase());
}
else
{
output.printf("%s is not in the dictionary%n%n", word.toUpperCase());
}
output.print("Enter a word to look up: ");
}
contains
As a client, you should not care how a method is implemented.
However, you might speculate that contains
must
search the entire collection to determine if an element is in
the collection. In fact, the different kinds of collections
implement contains
with varying degrees of
efficiency.
A modern pc is sufficiently fast that it will be difficult to
compare computational efficiency by looking up a single word;
instead, we will choose N
words at random. This
is easy to do if the dictionary is a List
because:
List
supports indexed accessBy randomly generating a number between 0
and the number of words in the dictionary minus 1
we can select a random word to look up.
Random
An instance of the Random
class generates random
numbers. Of interest to us is the method nextInt
:
int nextInt(int n)
Returns a pseudorandom, uniformly distributed
int
value between 0
(inclusive) and
the specified value (exclusive).
Based on the contract of nextInt
, we can pick
a random word from the dictionary like so:
Random rng = new Random(); int randomIndex = rng.nextInt(dictionary.size()); String randomWord = dictionary.get(randomIndex);
You can use the System
utility to measure elapsed
time. Of interest to us is the method nanoTime
:
static long nanoTime()
Returns the current value of the most precise available system
timer, in nanoseconds..
To measure elapsed time, you invoke nanoTime
twice: once to get the starting time and once to get the
ending time. The difference between the two times is the
elapsed time.
long startTime = System.nanoTime(); /* YOUR CODE YOU WANT TO TIME */ long endTime = System.nanoTime(); long elapsedTime = endTime - startTime;
We will ask the user for the number of words to look up. Because the words are chosen randomly, we will get a different measure of time elapsed everytime we run the program. Instead, we will compute an average time by asking the user for the number of times to run the test.
% java DictionaryArrayList Number of words to look up? 8000 Number of tests to run? 1000 Average time : 1297536.78 %
// Number of words and test output.print("Number of words to look up? "); String num = input.nextLine().trim(); int numWords = Integer.parseInt(num); output.print("Number of tests to run? "); num = input.nextLine().trim(); int numTests = Integer.parseInt(num); // Average time long sumTime = 0; // Perform numTests for (int test = 0; test < numTests; test++) { long startTime = System.nanoTime(); // Look up numWords random words for (int i = 0; i < numWords; i++) { int index = rng.nextInt(WORDS.size()); String word = WORDS.get(index); boolean isInDictionary = dictionary.contains(word); } long endTime = System.nanoTime(); sumTime += endTime - startTime; } output.println("Average time : " + ((double) sumTime / numTests));
import java.util.*; import java.io.*; public class DictionaryArrayList { public static void main(String[] args) throws IOException { PrintStream output = System.out; Scanner input = new Scanner(System.in); Collection<String> dictionary = new ArrayList<String>(); final String DICT_NAME = "dictionary.txt"; Scanner dictionaryInput = new Scanner(new File(DICT_NAME)); for (; dictionaryInput.hasNextLine();) { String word = dictionaryInput.nextLine().trim(); dictionary.add(word); } output.println(dictionary.size()); // ArrayList version of the dictionary to pick random words from final ArrayList<String> WORDS = new ArrayList<String>(dictionary); // random number generator to randomly pick words Random rng = new Random(); // Number of words and test output.print("Number of words to look up? "); String num = input.next().trim(); int numWords = Integer.parseInt(num); output.print("Number of tests to run? "); num = input.next().trim(); int numTests = Integer.parseInt(num); // Average time long sumTime = 0; // Perform numTests for (int test = 0; test < numTests; test++) { long startTime = System.nanoTime(); // Look up numWords random words for (int i = 0; i < numWords; i++) { int index = rng.nextInt(WORDS.size()); String word = WORDS.get(index); boolean isInDictionary = dictionary.contains(word); } long endTime = System.nanoTime(); sumTime += endTime - startTime; } output.printf("Average time : %,.0f", ((sumTime + 0.0) / numTests)); } }
ArrayList
Number of words searched for |
Average time (ns) |
---|---|
100 | 32,176,990 |
200 | 62,919,539 |
400 | 125,036,188 |
800 | 258,447,301 |
1600 | 495,846,249 |
By changing one line of code, we can test any collection
that implements the Collection
interface. How is
this possible? Because the dictionary
is declared
as a Collection
and anything that implements
the Collection
interface is substitutable for
a Collection
.
Collection<String> dictionary = new ArrayList<String>();
Collection<String> dictionary = new LinkedList<String>();
Collection<String> dictionary = new TreeSet<String>();
Collection<String> dictionary = new HashSet<String>();
We can write a single program to test all of the collections we are interested in; the program asks the user for a choice of collection and we create a collection of appropriate type:
Collection<String> dictionary = null; // Type of collection output.println("Choose a collection"); output.println("1. ArrayList"); output.println("2. LinkedList"); output.println("3. TreeSet"); output.println("4. HashSet"); int collectionType = input.nextInt(); if (collectionType == 1) { dictionary = new ArrayList<String>(); } else if (collectionType == 2) { dictionary = new LinkedList<String>(); } else if (collectionType == 3) { dictionary = new TreeSet<String>(); } else { dictionary = new HashSet<String>(); }
import java.util.*; import java.io.*; public class DictionaryAny { public static void main(String[] args) throws IOException { PrintStream output = System.out; Scanner input = new Scanner(System.in); // Type of collection Collection<String> dictionary = null; output.println("Choose a collection"); output.println("1. ArrayList"); output.println("2. LinkedList"); output.println("3. TreeSet"); output.println("4. HashSet"); int collectionType = input.nextInt(); if (collectionType == 1) { output.println("array list"); dictionary = new ArrayList<String>(); } else if (collectionType == 2) { dictionary = new LinkedList<String>(); } else if (collectionType == 3) { dictionary = new TreeSet<String>(); } else { dictionary = new HashSet<String>(); } final String DICT_NAME = "dictionary.txt"; Scanner dictionaryInput = new Scanner(new File(DICT_NAME)); for (; dictionaryInput.hasNextLine();) { String word = dictionaryInput.nextLine().trim(); dictionary.add(word); } output.println(dictionary.size()); // ArrayList version of the dictionary to pick random words from final ArrayList<String> WORDS = new ArrayList<String>(dictionary); // random number generator to randomly pick words Random rng = new Random(); // Number of words and test output.print("Number of words to look up? "); String num = input.next().trim(); int numWords = Integer.parseInt(num); output.print("Number of tests to run? "); num = input.next().trim(); int numTests = Integer.parseInt(num); // Average time long sumTime = 0; for (int test = 0; test < numTests; test++) { long startTime = System.nanoTime(); for (int i = 0; i < numWords; i++) { int index = rng.nextInt(WORDS.size()); String word = WORDS.get(index); boolean isInDictionary = dictionary.contains(word); } long endTime = System.nanoTime(); sumTime += endTime - startTime; } output.printf("Average time : %,.0f", ((double) sumTime / numTests)); } }
Number of words searched for |
Relative average time | |||
---|---|---|---|---|
ArrayList | LinkedList | TreeSet | HashSet | |
100 | 1,817 | 2,891 | 6 | 1 |
200 | 3,554 | 5,803 | 12 | 2 |
400 | 7,062 | 11,344 | 26 | 4 |
800 | 14,597 | 22,209 | 49 | 8 |
1,600 | 28,004 | 43,998 | 96 | 15 |
Set
is probably more efficient than
searching an ordinary List
HashSet
is probably more efficient than
searching a TreeSet
But what about other operations?
Continue reading Chapter 10.
Write a program that creates a TreeSet
of
strings and a HashSet
of strings. Insert the strings
"A", "B", "C", ..., "Z"
into both sets and then print
both sets (you can use output.println
). What do you
notice about the order of the elements in both sets?
Write a program that creates an ArrayList
of
Integer
and a LinkedList
of
Integer
. Add the numbers 1, 2, 3, ..., 100000
to the front of both lists and time how long it takes for each type
of list. Which is (much) faster? What happens when you add to the end
of the lists?