slanted W3C logo

Day 22 — Search

Today we look at searching a collection for a particular element and computational complexity.

Search

Once you have a collection of things it becomes a very frequent occurrence that you want to find a particular element of the collection. For example, you might want to search for:

Search

In one type of search, you are only interested whether or not the element is in the collection (you don't need a reference to the element). This type of search returns a boolean value (true if the element is in the collection, and false otherwise).

A second type of search, you want the reference to the searched for element. This type of search returns a reference to an object if the element is in the collection, and null otherwise.

Search by Traversal

Both kinds of search can be solved using traversal. By visiting every element in the collection you can determine if a particular element is in the collection.

Suppose you want to search a Portfolio for an Investment in a particular Stock:

// Portfolio pf = ...
// Investment given = ...

boolean found = false;
for (Investment inv : pf)
{
   found = inv.equals(given);  // not quite right
}

The problem with the loop is that the value of found is determined by the last investment visited in the traversal.

Loop Invariants

A loop invariant is a formal boolean statement about the relationship between variables used in the loop. They can be used to formally prove that a loop is correct.

  1. A loop invariant must be true just before the loop is run (establishing the invariant).
  2. Inside the loop, the invariant might become false.
  3. At the end of the loop, the invariant must be true (maintaining the invariant).

A loop invariant is chosen so that when the loop termination condition is true and the invariant is true then the goal of the loop has been achieved.

Loop Invariants

boolean found = false;
for (Investment inv : pf)
{
   found = inv.equals(given);  // not quite right
}

What is the termination condition of the above loop?

has the last element been visited

What is the loop invariant?

found is true if inv equals given;
otherwise found is false

The invariant tells you immediately what the problem with the loop is: found is determined by the last investment visited in the traversal.

Loop Invariants

What do we want the loop invariant to be?

found is true if inv equals given;
otherwise found maintains its current value

The invariant leads to following correct loop:

boolean found = false;
for (Investment inv : pf)
{
   found = inv.equals(given) || found;
}

Search by Traversal II

When a reference to the searched for object is needed, you use the exact same approach except you also need a reference variable to hold the reference to the searched for object:

Investment lastFound = null;
for (Investment inv : pf)
{
   found = found || inv.equals(given);
   if (found)
   {
      lastFound = inv;
   }
}

If there are multiple Investments that are equal to given then this code yields a reference to the last such Investment.

Search by Traversal III

Suppose you want to find all of the Investments equal to given. You need to create a collection for all of the found Investments:

Portfolio allFound = new Portfolio("", pf.size());
for (Investment inv : pf)
{
   found = found || inv.equals(given);
   if (found)
   {
      allFound.add(inv);
   }
}

Search Complexity

Suppose you use traversal to search some collection for a particular element. Also, suppose that the collection holds N elements.

What is the smallest number of elements visited in the best case?

1, the desired element happens to be
the first element visited

What is the largest number of elements visited in the worst case?

N, the desired element happens to be
the last element visited,
or it is not in the collection

Computational Complexity

Computational complexity is a branch of theoretical computer science concerned with classifying computational problems based on their difficulty. One measure of difficulty is the worst-case complexity: how many basic operations are required to solve a problem of size N where N is a large number.

For search of a collection, N is the size of the collection. In the worst case, you have to visit all N elements in the collection. We say that the complexity of exhaustive search is O(N).

O(N) is pronounced "big-O of N" or "order of N".

Computational Complexity

Complexity does not tell you the actual running time of a program.

Complexity tells you how the running time depends on the size of the input N (for large values of N).

For exhaustive search, if you double N then the running time will also double.

Computational Complexity

More generally, the worst case complexity of a program will vary as some function f(N). In big-O notation we write O(f(N)).

f(N) O(f(N)) name
1 O(1) constant
log N O(log N)     logarithmic
N O(N) linear
N2 O(N2) quadratic
Nc, c > 1     O(Nc) polynomial
cN, c > 1     O(cN) exponential
N! O(N!) factorial

Faster Searches

If a collection stores its elements in some special way, then it is possible to perform search faster than O(N).

If a collection stores its elements in sorted order then you can use binary search to find an element. The worst case complexity of binary search is O(log N).


Computational Complexity

Suppose you wanted to output the investments in a portfolio in order from the smallest total current value to the highest total current value. A simple algorithm that can perform this sorting is called selection sort.

1. find the investment with the smallest current value
      output it and remove it from the portfolio
2. repeat step 1 until the portfolio is empty

What is the worst-case complexity of step 1? What is the worst-case complexity of the entire algorithm?