Comparisons: Pros and Cons
All Methods
- Empty buckets aren't necessarily “wasted space”—they make search faster.
- Particularly unsuccessful search, which may need to perform no comparisons at all to determine the item isn't in the table.
Separate Chaining
- Pro: Simple and easy to implement
- Pro: Tied for best unsuccessful-lookup expected comparisons.
- Pro: No number-theoretic demands on the number of buckets (powers of two are fine, so successive doubling is easy).
- Pro: Straightforward removal.
- We haven't talked about it, but just remove the key from its chain!
- Con: Tied for worst successful lookup expected comparisons.
- Con: Memory overhead of storing pointers for chains.
- Can be significant if the items being stored are small, but not a big deal if items stored are large.
Double Hashing
- Pro: Best expected lookup and insert performance under ideal conditions.
- As a result, we can have higher load factors/smaller tables with good expected comparisons.
- Pro: Addresses any clustering concerns (better even than separate chaining).
- Pro: Very space efficient for small objects, if we store them directly in the table.
- Meh: Needs two independent values for hash and increment.
- But we can arrange things can actually come from the same hash function, so not a big deal.
- Meh: In some variants, the number of buckets has to be prime (which makes successive doubling a bit more annoying).
- But we can use a power-of-two size and an odd increment instead.
- Con: Relatively more sensitive to choice of hash functions.
- Con: Removal is complicated.
- Can't just leave an empty bucket—that would mess up lookup!
Quadratic Probing
- Pro: Combats primary clustering.
- Pro: Expected comparisons nearly as good as double hashing.
- Pro: Very space efficient for small objects, if we store them directly in the table.
- Con: Vulnerable to secondary clustering (things that start in the same bucket cluster together, just like separate chaining)
- Con: Number of buckets needs to be prime (which makes successive doubling a bit more annoying).
- Con: Requires \( \lambda \lt 0.5 \) to guarantee that an empty bucket will be found.
- (That means a lot of unused space in the table).
- Con: Removal is complicated.
Linear Probing
- Pro: The easiest to implement open-addressing method.
- Pro: Sometimes CPU caching makes it efficient to access nearby elements in an array, making probing faster.
- Pro: No number-theoretic demands on the number of buckets (powers of two are fine, so successive doubling is easy).
- Pro: Very space efficient for small objects, if we store them directly in the table.
- Pro: Although it works best when \( \lambda \lt 0.5 \), it still works when \( \lambda \gt 0.5 \)—unlike quadratic probing, which could lock up.
- Meh: Removal is the least complicated of all the open-addressing methods, but still more complex than separate chaining.
- You just “rehash” the rest of the cluster after the removed element—something that's impossible for techniques that break up clusters.
- Con: Prone to primary clustering: Worst overall expected comparisons.
- Requires low load factors for good performance.
- But when \( \lambda \lt 0.5 \), primary clustering is not that big a deal.
- Requires low load factors for good performance.
(When logged in, completion status appears here.)