You can think of a dictionary as being one of:
ArrayList
: slow lookupLinkedList
: slowest lookupHashSet
: fastest lookupTreeSet
: fast lookupIt would seem to make sense that searching for an element in a sorted collection would be faster than searching in an unsorted collection.
The basic idea for searching a sorted collection is used by good players of the Price Is Right Clock Game:
There are many algorithms for sorting a collection. One of the simplest sorting algorithms is called selection sort:
// Create a sorted list s // from a collection c; c must implement remove create an empty collection named s while size of c > 0 find the smallest element in c add the smallest element to the end of s remove the smallest element from c
A modified version exists that does not require creating a second list to hold the sorted collection.
// Create a sorted list s
// from a collection c; c must implement remove
// assume c is a collection of String
create an empty list named s
while size of c > 0
find the smallest element in c
add the smallest element to the end of s
remove the smallest element from c
List<String> s = new ArrayList<String>();
create an empty list named s
while size of c > 0
find the smallest element in c
add the smallest element to the end of s
remove the smallest element from c
List<String> s = new ArrayList<String>();
for (; c.size() > 0; ) // or while (c.size() > 0)
{
}
create an empty list named s
while size of c > 0
find the smallest element in c
add the smallest element to the end of s
remove the smallest element from c
List<String> s = new ArrayList<String>();
for (; c.size() > 0; )
{
String min = Collections.min(c);
}
create an empty list named s
while size of c > 0
find the smallest element in c
add the smallest element to the end of s
remove the smallest element from c
List<String> s = new ArrayList<String>();
for (; c.size() > 0; )
{
String min = Collections.min(c);
s.add(min);
}
create an empty list named s
while size of c > 0
find the smallest element in c
add the smallest element to the end of s
remove the smallest element from c
List<String> s = new ArrayList<String>();
for (; c.size() > 0; )
{
String min = Collections.min(c);
s.add(min);
c.remove(min);
}
It is not difficult to implement selection sort, but there
is an easier way to sort: delegate to the java.util.Collections
utility:
static
<T extends Comparable<? super T>>
void sort(List<T> list)
Sorts the specified list into ascending order, according to the natural
ordering of its elements.
To sort our dictionary:
List<String> dictionary = new ArrayList<String>();
/*
read in the dictionary contents here
*/
Collections.sort(dictionary);
Notice that we don't need to create a second collection to hold the sorted dictionary.
Collections.sort
was written by
professional software engineers who have carefully designed and
tested the method. It has been used by millions of clients.Collections.sort
, it will be fixed in the next
Java releaseCollections.sort
is much faster
than our home grown selection sortIf you use contains
on a sorted list to look
up randomly selected words, you won't see any improvement in
performance (an average) because contains
makes no assumptions about the order of the elements in the
list.
Efficiently searching a sorted list requires a method that can exploit the sorted structure of the list. One such algorithm is binary search.
The algorithm is easy to describe, but tricky to implement:
1. get the middle element of the list 2. if the element is less than the sought for value discard the front half of the list else if the element is greater than the sought for value discard the back half of the list else done, found the sought for value 3. repeat steps 1 and 2 with the remainder of the list
A Java implementation for arrays of integers can be found here.
It can be shown that the worst-case complexity of binary search is O(log2n).
n | 2 | 4 | 8 | 16 | 32 | 64 | 128 | 256 | 512 | 1024 ⇐ 512 more work (compared to n = 2) |
---|---|---|---|---|---|---|---|---|---|---|
log2n | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 ⇐ 5 times more time (compared to n = 2) |
Delegation is again your friend (from java.util.Collections
):
static <T> int
binarySearch(List<? extends Comparable<? super T>> list, T key)
Searches the specified list for the specified object using the binary search
algorithm. The list must be sorted into ascending order according to the
natural ordering of its elements (as by the sort(List)
method)
prior to making this call. If it is not sorted, the results are undefined.
Returns:
the index of the element if it is in the list; a negative value otherwise
(see API for details).
List<String> dictionary = new ArrayList<String>();
/*
read in the dictionary contents here
*/
Collections.sort(dictionary);
/*
choose a word to look up here
String word = ...
*/
int index = Collections.binarySearch(dictionary, word);
import java.util.*; import java.io.*; public class DictionaryBinarySearch { public static void main(String[] args) throws IOException { PrintStream output = System.out; Scanner input = new Scanner(System.in); List<String> dictionary = new ArrayList<String>(); final String DICT_NAME = "dictionary.txt"; Scanner dictionaryInput = new Scanner(new File(DICT_NAME)); for (; dictionaryInput.hasNextLine();) { String word = dictionaryInput.nextLine().trim(); dictionary.add(word); } // ArrayList version of the dictionary to pick random words from final ArrayList<String> WORDS = new ArrayList<String>(dictionary); // random number generator to randomly pick words Random rng = new Random(); // Number of words and test output.print("Number of words to look up? "); String num = input.next().trim(); int numWords = Integer.parseInt(num); output.print("Number of tests to run? "); num = input.next().trim(); int numTests = Integer.parseInt(num); // Average time Collections.sort(dictionary); long sumTime = 0; for (int test = 0; test < numTests; test++) { long startTime = System.nanoTime(); for (int i = 0; i < numWords; i++) { int index = rng.nextInt(WORDS.size()); String word = WORDS.get(index); int idx = Collections.binarySearch(dictionary, word); } long endTime = System.nanoTime(); sumTime += endTime - startTime; } output.printf("Average time : %,.0f", ((sumTime + 0.0) / numTests)); } }
Number of words searched for |
Relative average time | ||||
---|---|---|---|---|---|
ArrayList | LinkedList | ArrayList with binary search |
TreeSet | HashSet | |
100 | 1,817 | 2,891 | 7 | 6 | 1 |
200 | 3,554 | 5,803 | 13 | 12 | 2 |
400 | 7,062 | 11,344 | 24 | 26 | 4 |
800 | 14,597 | 22,209 | 46 | 49 | 8 |
1,600 | 28,004 | 43,998 | 94 | 96 | 15 |
ArrayList
has
similar search efficiency to a sorted setFinish reading Chapter 10.