slanted W3C logo

Day 31 — Maps and Traversing Collections

Recap

List models a numbered sequence of elements where the client can control the ordering of the elements.

Set models a group of unique elements (no duplicates) where the client has no control over the ordering of the elements.

Map models a group of elements that are accessed by unique keys.

Maps

A map uses a key to insert, remove, and retrieve values. For example, suppose you wanted to count the number of times each word was used in a text document:

Everything starts somewhere, although many physicists disagree.
  But people have always been dimly aware of the problem with the start of things. They wonder aloud how the snowplough driver gets to work, or how the makers of dictionaries look up the spelling of the words. Yet there is the constant desire to find some point in the twisting, knotting, ravelling nets of space-time on which a metaphorical finger can be put to indicate that, here, is the point where it all began...

Opening paragraphs of Hogfather, Terry Pratchett

to              3
they            1
but             1
everything      1
aware           1
somewhere       1
where           1
people          1
space-time      1
been            1
of              5
although        1
began           1
wonder          1
on              1
snowplough      1
be              1
work            1
ravelling       1
how             2
or              1
spelling        1
here            1
disagree        1
finger          1
always          1
many            1
that            1
start           1
makers          1
some            1
dictionaries    1
put             1
can             1
have            1
dimly           1
indicate        1
nets            1
metaphorical    1
words           1
driver          1
all             1
find            1
twisting        1
look            1
aloud           1
with            1
is              2
knotting        1
it              1
constant        1
desire          1
a               1
physicists      1
gets            1
problem         1
the             9
in              1
up              1
point           2
which           1
starts          1
there           1
things          1
yet             1

Counting Words 1

To count the number of times a word appears in a document, you want to map a word (the key) to its count (the value). The appropriate collection is a map with String keys and Integer values:

import java.util.Map;
import java.util.HashMap;

public class WordCount
{
   public static void main(String[] args)
   {
      Map<String, Integer> wordCount =
            new HashMap<String, Integer>();
            
   }
}

Note 1 on WordCount Example

1. Collections of primitives are not allowed
The standard Java collections cannot hold primitive values as elements or as part of a key-value pair. The reason for this is that the collections were designed assuming the elements would all be substitutable for Object (ie. the elements need to to have methods like equals, toString, and hashCode).

Because of this restriction, the client must use a wrapper class (Integer for int, Double for double, etc.) for the primitive type when a collection of primitives is desired.

Counting Words 2

Let's assume that you have created a scanner for the text file and created a loop to read in each word.

Suppose you read in a word. If the word is a new word, you want to insert the word as a key with a value of 1 (the word has been counted once). If the word is not a new word, you want to add one to the count for that word.

How can you tell if a word is already a key in the map?

Counting Words 3

There are two ways to check if a map holds a key.

boolean containsKey(Object key)
Returns true if this map contains a mapping for the specified key.

V get(Object key)
Returns the value to which the specified key is mapped, or null if this map contains no mapping for the key.

For the word counting problem, you have to call get if you need to update the word count, so you might as well use get:

         // is word already in the map?
         Integer count = wordCount.get(word);
         if (count == null)
         {
            // word is not in the map
         }
         else
         {
            // word is in the map
         }

Counting Words 4

Once you know if a word is a key in the map, you want to update the value associated with the key.

The method put inserts a key-value pair into the map.

V put(K key, V value)
Associates the specified value with the specified key in this map. If the map previously contained a mapping for the key, the old value is replaced.

         // is word already in the map?
         Integer count = wordCount.get(word);
         if (count == null)
         {
            wordCount.put(word, 1);
         }
         else
         {
            wordCount.put(word, count + 1);
         }

Note 2 on WordCount Example

2. You said collections cannot hold primitives but the example uses int values!

The collection MUST be declared using wrapper types, that is:

Correct Incorrect (compile-time error)
List<Double> List<double>
Set<Character> Set<char>
Map<Integer, Float> Map<int, float>

Java will automatically convert a primitive (like int) to its corresponding wrapper class type (like Integer) when needed; the automatic conversion is called autoboxing. The opposite conversion from wrapper class type to primitive also happens automatically (and is auto-unboxing).

Automatic boxing and unboxing is why you can seemingly insert and get primitives from a collection.

Note 3 on WordCount Example

3. put returns a value

Maps only allow a key to map to single value; this means that the keys in a map must be unique (ie. the keys form a set). If you use an existing key to insert a value, the old value is replaced by the new value, and put returns the old value.

put returns null if the key used is not currently in the map.

Retrieving all Keys

The keys in a map are unique so they form a set. A client can ask a map for its set of keys:

      // Map<String, Integer> wordCount ...
      
      Set<String> words = wordCount.keySet();
      
      output.println(words);

The code fragment above prints:

[to, they, but, everything, aware, somewhere, where, people, space-time, been, of, although, began, wonder, on, snowplough, be, work, ravelling, how, or, spelling, here, disagree, finger, always, many, that, start, makers, some, dictionaries, put, can, have, dimly, indicate, nets, metaphorical, words, driver, all, find, twisting, look, aloud, with, is, knotting, it, constant, desire, a, physicists, gets, problem, the, in, up, point, which, starts, there, things, yet]

Notice that HashMap does not order the keys. If you want sorted keys, use TreeMap.

for Loops and Collections

The easiest way to traverse a list or a set is to use the enhanced for loop (also called the for each loop). This type of loop only works for classes that implement the Collection interface (and arrays).

Recall that the keys for our word counting map are all strings. If we want to print each key, one key per line, we would want a loop like:

for each key in the map's keyset
  print the key followed by a newline

You can use a for-each loop:

      for (String key : wordCount.keySet())
      {
         output.println(key);
      }

for Loops and Collections

The syntax for for-each loops is:

      for (T elem : coll)
      {
         // do something with elem here
      }

For our example:

      for (String key : wordCount.keySet())
      {
         output.println(key);
      }

The Complete WordCount Program

The text file used in this example.

import java.util.*;
import java.io.*;

public class WordCount
{
   public static void main(String[] args) throws IOException
   {
      Map<String, Integer> wordCount =
            new HashMap<String, Integer>();
      
      Scanner fInput = new Scanner(new File("hogfather.txt"));
      
      // You can use a for loop or a while loop
      // for (; fInput.hasNext(); )
      while (fInput.hasNext())
      {
         String word = fInput.next().toLowerCase();
         
         // remove punctuation
         // (except for hyphens and apostrophes)
         String regex = "[^a-z-']";
         word = word.replaceAll(regex, "");
         
         // is word already in the map?
         Integer count = wordCount.get(word);
         if (count == null)
         {
            wordCount.put(word, 1);
         }
         else
         {
            wordCount.put(word, count + 1);
         }
      }
      
      // get the set of words in the map
      Set<String> words = wordCount.keySet();
      
      // for each word in words
      for (String word : words)
      {
         System.out.printf("%-15s", word);
         Integer count = wordCount.get(word);
         System.out.println(" " + count);
      }
   }
}

To Do For Next Lecture

Read 11.1 and 11.2.