CS 70

Comparisons: Pros and Cons

All Methods

  • Empty buckets aren't necessarily “wasted space”—they make search faster.
    • Particularly unsuccessful search, which may need to perform no comparisons at all to determine the item isn't in the table.

Separate Chaining

  • Pro: Simple and easy to implement
  • Pro: Tied for best unsuccessful-lookup expected comparisons.
  • Pro: No number-theoretic demands on the number of buckets (powers of two are fine, so successive doubling is easy).
  • Pro: Straightforward removal.
    • We haven't talked about it, but just remove the key from its chain!
  • Con: Tied for worst successful lookup expected comparisons.
  • Con: Memory overhead of storing pointers for chains.
    • Can be significant if the items being stored are small, but not a big deal if items stored are large.

Double Hashing

  • Pro: Best expected lookup and insert performance under ideal conditions.
    • As a result, we can have higher load factors/smaller tables with good expected comparisons.
  • Pro: Addresses any clustering concerns (better even than separate chaining).
  • Pro: Very space efficient for small objects, if we store them directly in the table.
  • Meh: Needs two independent values for hash and increment.
    • But we can arrange things can actually come from the same hash function, so not a big deal.
  • Meh: In some variants, the number of buckets has to be prime (which makes successive doubling a bit more annoying).
    • But we can use a power-of-two size and an odd increment instead.
  • Con: Relatively more sensitive to choice of hash functions.
  • Con: Removal is complicated.
    • Can't just leave an empty bucket—that would mess up lookup!

Quadratic Probing

  • Pro: Combats primary clustering.
  • Pro: Expected comparisons nearly as good as double hashing.
  • Pro: Very space efficient for small objects, if we store them directly in the table.
  • Con: Vulnerable to secondary clustering (things that start in the same bucket cluster together, just like separate chaining)
  • Con: Number of buckets needs to be prime (which makes successive doubling a bit more annoying).
  • Con: Requires \( \lambda \lt 0.5 \) to guarantee that an empty bucket will be found.
    • (That means a lot of unused space in the table).
  • Con: Removal is complicated.

Linear Probing

  • Pro: The easiest to implement open-addressing method.
  • Pro: Sometimes CPU caching makes it efficient to access nearby elements in an array, making probing faster.
  • Pro: No number-theoretic demands on the number of buckets (powers of two are fine, so successive doubling is easy).
  • Pro: Very space efficient for small objects, if we store them directly in the table.
  • Pro: Although it works best when \( \lambda \lt 0.5 \), it still works when \( \lambda \lt 0.5 \)—unlike quadratic probing, which could lock up.
  • Meh: Removal is the least complicated of all the open-addressing methods, but still more complex than separate chaining.
    • You just “rehash” the rest of the cluster after the removed element—something that's impossible for techniques that break up clusters.
  • Con: Prone to primary clustering: Worst overall expected comparisons.
    • Requires low load factors for good performance.
      • But when \( \lambda \lt 0.5 \), primary clustering is not that big a deal.

(When logged in, completion status appears here.)