CS 70

Key Points

  • Open Addressing: In case of collision, put the key somewhere else.
    • Look for a new spot with a sequence of "probes" until an empty bucket is found.
    • For all methods we studied, same asymptotic complexity as separate chaining.
      • Expected \( \Theta(1) \) lookup.
      • Amortized expected \( \Theta(1) \) insert.
    • Different methods have larger or smaller constants for a value of \( \lambda \).
      • Larger \( \lambda \): more comparisons per operation, but less unused space.
  • Linear probing: Probe \(H, (H + 1), (H + 2), (H + 3), \ldots \).
    • Nice and simple.
    • Main concern: Primary clustering (lots of keys next to each other in the table), which
      • Causes long lookup times (many comparisons needed to find an empty bucket), and
      • Any key that hashes into the cluster will make the cluster bigger.
    • Worst overall expected comparisons.
      • Needs low \( \lambda \) to have good performance.
    • Despite its bad-looking numbers, modern CPU architectures mean it may still be very competitive with other schemes.
  • Quadratic probing: Probe \(H, (H + 1), (H + 4), (H + 9), (H + 16), \ldots \).
    • Prevents primary clustering.
    • Problem: Need to be careful about table size.
      • Probes can get stuck in an infinite loop that doesn't visit the whole table.
      • Solution: prime table size and \( \lambda < 0.5 \) guarantees an empty bucket will be visited, but
      • \( \lambda < 0.5 \) means lots of empty buckets (unused space).
    • Problem: Secondary clustering.
      • Keys hashed to the same bucket cluster along the probing sequence.
      • (Not as bad as primary clustering but still an issue.)
  • Double hashing: Probe \( H, (H + I ), (H + 2I ), (H + 3I), \ldots \).
    • Prevents both primary and secondary clustering.
      • Every key has its own probe sequence.
    • Best overall expected comparisons under ideal conditions.
      • Can be efficient with higher values of \( \lambda \).
    • Problem: Need to be careful that table size and increment are relatively prime.
    • Problem: Needs hashing to make both a bucket number and an increment.
      • Both values need to spread keys out and be statistically independent of one another.
      • But both can be generated by a single hash function.

(When logged in, completion status appears here.)