Key Points
- Open Addressing: In case of collision, put the key somewhere else.
- Look for a new spot with a sequence of "probes" until an empty bucket is found.
- For all methods we studied, same asymptotic complexity as separate chaining.
- Expected \( \Theta(1) \) lookup.
- Amortized expected \( \Theta(1) \) insert.
- Different methods have larger or smaller constants for a value of \( \lambda \).
- Larger \( \lambda \): more comparisons per operation, but less unused space.
- Linear probing: Probe \(H, (H + 1), (H + 2), (H + 3), \ldots \).
- Nice and simple.
- Main concern: Primary clustering (lots of keys next to each other in the table), which
- Causes long lookup times (many comparisons needed to find an empty bucket), and
- Any key that hashes into the cluster will make the cluster bigger.
- Worst overall expected comparisons.
- Needs low \( \lambda \) to have good performance.
- Despite its bad-looking numbers, modern CPU architectures mean it may still be very competitive with other schemes.
- Quadratic probing: Probe \(H, (H + 1), (H + 4), (H + 9), (H + 16), \ldots \).
- Prevents primary clustering.
- Problem: Need to be careful about table size.
- Probes can get stuck in an infinite loop that doesn't visit the whole table.
- Solution: prime table size and \( \lambda < 0.5 \) guarantees an empty bucket will be visited, but
- \( \lambda < 0.5 \) means lots of empty buckets (unused space).
- Problem: Secondary clustering.
- Keys hashed to the same bucket cluster along the probing sequence.
- (Not as bad as primary clustering but still an issue.)
- Double hashing: Probe \( H, (H + I ), (H + 2I ), (H + 3I), \ldots \).
- Prevents both primary and secondary clustering.
- Every key has its own probe sequence.
- Best overall expected comparisons under ideal conditions.
- Can be efficient with higher values of \( \lambda \).
- Problem: Need to be careful that table size and increment are relatively prime.
- Problem: Needs hashing to make both a bucket number and an increment.
- Both values need to spread keys out and be statistically independent of one another.
- But both can be generated by a single hash function.
- Prevents both primary and secondary clustering.
(When logged in, completion status appears here.)