slanted W3C logo

Day 30 — Sorting and Binary Search

Recap

You can think of a dictionary as being one of:

Searching a Sorted Collection

It would seem to make sense that searching for an element in a sorted collection would be faster than searching in an unsorted collection.

The basic idea for searching a sorted collection is used by good players of the Price Is Right Clock Game:

Sorting a Collection

There are many algorithms for sorting a collection. One of the simplest sorting algorithms is called selection sort:

// Create a sorted list s
//  from a collection c; c must implement remove

create an empty collection named s
while size of c > 0
  find the smallest element in c
  add the smallest element to the end of s
  remove the smallest element from c

A modified version exists that does not require creating a second list to hold the sorted collection.

Translating to Java 1

// Create a sorted list s
//  from a collection c; c must implement remove
//  assume c is a collection of String

create an empty list named s
while size of c > 0
  find the smallest element in c
  add the smallest element to the end of s
  remove the smallest element from c
      List<String> s = new ArrayList<String>();

Translating to Java 2

create an empty list named s
while size of c > 0
  find the smallest element in c
  add the smallest element to the end of s
  remove the smallest element from c
      List<String> s = new ArrayList<String>();
      for (; c.size() > 0; )   // or while (c.size() > 0)
      {
      
      }

Translating to Java 3

create an empty list named s
while size of c > 0
  find the smallest element in c
  add the smallest element to the end of s
  remove the smallest element from c
      List<String> s = new ArrayList<String>();
      for (; c.size() > 0; )
      {
         String min = Collections.min(c);
      }

Translating to Java 4

create an empty list named s
while size of c > 0
  find the smallest element in c
  add the smallest element to the end of s
  remove the smallest element from c
      List<String> s = new ArrayList<String>();
      for (; c.size() > 0; )
      {
         String min = Collections.min(c);
         s.add(min);
      }

Translating to Java 5

create an empty list named s
while size of c > 0
  find the smallest element in c
  add the smallest element to the end of s
  remove the smallest element from c
      List<String> s = new ArrayList<String>();
      for (; c.size() > 0; )
      {
         String min = Collections.min(c);
         s.add(min);
         c.remove(min);
      }

Sorting via Procedural Delegation

It is not difficult to implement selection sort, but there is an easier way to sort: delegate to the java.util.Collections utility:

static
<T extends Comparable<? super T>>
void sort(List<T> list)

Sorts the specified list into ascending order, according to the natural ordering of its elements.

To sort our dictionary:

      List<String> dictionary = new ArrayList<String>();
      /*
         read in the dictionary contents here
      */
      Collections.sort(dictionary);

Notice that we don't need to create a second collection to hold the sorted dictionary.

Advantages of Delegation

Efficiency of Sort


Searching a Sorted List

If you use contains on a sorted list to look up randomly selected words, you won't see any improvement in performance (an average) because contains makes no assumptions about the order of the elements in the list.

Efficiently searching a sorted list requires a method that can exploit the sorted structure of the list. One such algorithm is binary search.

Binary Search

The algorithm is easy to describe, but tricky to implement:

1. get the middle element of the list
2. if the element is less than the sought for value
      discard the front half of the list
   else if the element is greater than the sought for value
      discard the back half of the list
   else
      done, found the sought for value
3. repeat steps 1 and 2 with the remainder of the list

A Java implementation for arrays of integers can be found here.

It can be shown that the worst-case complexity of binary search is O(log2n).

n 2 4 8 16 32 64 128 256 512 1024 ⇐ 512 more work
(compared to n = 2)
log2n 1 2 3 4 5 6 7 8 9 10 ⇐ 5 times more time
(compared to n = 2)

Binary Search in Java

Delegation is again your friend (from java.util.Collections):

static <T> int
binarySearch(List<? extends Comparable<? super T>> list, T key)


Searches the specified list for the specified object using the binary search algorithm. The list must be sorted into ascending order according to the natural ordering of its elements (as by the sort(List) method) prior to making this call. If it is not sorted, the results are undefined.
Returns:
the index of the element if it is in the list; a negative value otherwise (see API for details).

      List<String> dictionary = new ArrayList<String>();
      /*
         read in the dictionary contents here
      */
      Collections.sort(dictionary);
      
      /*
         choose a word to look up here
         String word = ...
      */
      int index = Collections.binarySearch(dictionary, word);
import java.util.*;
import java.io.*;

public class DictionaryBinarySearch
{
   public static void main(String[] args) throws IOException
   {
      PrintStream output = System.out;
      Scanner input = new Scanner(System.in);
      
      List<String> dictionary = new ArrayList<String>();
      
      final String DICT_NAME = "dictionary.txt";
      Scanner dictionaryInput = new Scanner(new File(DICT_NAME));
      
      for (; dictionaryInput.hasNextLine();)
      {
         String word = dictionaryInput.nextLine().trim();
         dictionary.add(word);
      }
      
      // ArrayList version of the dictionary to pick random words from
      final ArrayList<String> WORDS = new ArrayList<String>(dictionary);
            
      // random number generator to randomly pick words
      Random rng = new Random();
      
      // Number of words and test
      output.print("Number of words to look up? ");
      String num = input.next().trim();
      int numWords = Integer.parseInt(num);
      
      output.print("Number of tests to run? ");
      num = input.next().trim();
      int numTests = Integer.parseInt(num);
      
      // Average time
      Collections.sort(dictionary);
      long sumTime = 0;
      
      for (int test = 0; test < numTests; test++)
      {
         long startTime = System.nanoTime();
         for (int i = 0; i < numWords; i++)
         {
            int index = rng.nextInt(WORDS.size());
            String word = WORDS.get(index);
            int idx = Collections.binarySearch(dictionary, word);
         }
         long endTime = System.nanoTime();
         sumTime += endTime - startTime;
      }
      
      output.printf("Average time : %,.0f", ((sumTime + 0.0) / numTests));
   }
}

Results for Several Collections

Number of words
searched for
Relative average time
ArrayList LinkedList ArrayList with
binary search
TreeSet HashSet
100 1,817 2,891 7 6 1
200 3,554 5,803 13 12 2
400 7,062 11,344 24 26 4
800 14,597 22,209 46 49 8
1,600 28,004 43,998 94 96 15

To Do For Next Lecture

Finish reading Chapter 10.