Collision Resolution: Quadratic Probing
We saw that the main problem with linear probing is clustering.
It tends to create large regions of filled buckets that just keep getting larger and larger.
That can lead to large (nearly worst-case) lookup times.
Now we'll talk about some probing strategies that are less susceptible to clustering.
One of the problems we noticed about primary clustering is that inserting any key that hashes into the cluster makes the cluster bigger. Wouldn't it be great if we could somehow “jump out” of the cluster and put the key somewhere else entirely?
The main idea of quadratic hashing is to “jump” farther and farther away with our probing.
- Specifically, if our key hashes to bucket \( H \), then we probe \( H, H + 1,\) \( H + 4, H + 9, \) \( H + 16, H + 16,\) \( H + 25, \ldots{}. \)
- Generally, if we are on our \( i \)th probe, then we probe bucket \( H + ( i - 1 )^{2} \).
- If our probe goes out of bounds, wrap around to the beginning of the table.
Example
Imagine that we start with a table with 20 buckets. Our keys are integers and our hash function is really simple: key % 20
.
Draw the table on a piece of paper, then insert the following keys in this order:
The video below shows the process of inserting these keys into the table…
We can see that in this example our big cluster was broken up and the key 21 can be looked up with far fewer probes.
Table Size Problems
Starting with a table with 16 buckets and the simple hash function k % 16
, draw the table on a piece of paper, then insert the following keys in this order:
All of these keys hash to the same bucket (bucket 0). The problem is that our probes only visit a tiny part of the table: 0, 1, 4, 9, then wrap back around to 0, 1, 4, 9, and so on.
Even though we have room for the key 64, we never find a place to put it!
Oh no, that's no good at all!
But all hope is not lost!
We should probably ask the number theorists what they think.
Theorem: If the hash table size is prime, then the probe sequence visits at least half of the buckets before repeating.
Thanks! We're not going to prove that, but rest assured, we trust our number theorists and it is true.
Thus our solution to this problem is
- Give our table a prime number as its size.
- When resizing, pick a prime number at least twice as big as the current size.
- Often we just keep a look-up table of prime numbers for this purpose.
- Resize often enough to keep \( \lambda \lt 0.5 \).
- That way the probing sequence will always find an empty spot eventually.
Expected Complexity
Once again, the analysis is more complicated than we want to get into.
But here are the results!
Expected number of probes for successful lookup (key in the table):
\[ \mathrm{E}[ C_{ \mathrm{SL} } ] = 1 - \ln{ ( 1 - \lambda ) } - \frac{ \lambda }{2} \]
Expected number of probes for unsuccessful lookup (key not in the table):
\[ \mathrm{E}[ C_{ \mathrm{UL} } ] = \frac{1}{ 1 - \lambda } - \lambda - \ln{ ( 1 - \lambda ) } \]
Uh. Is that better or worse?
Yeah, it's hard to tell at first glance! You can kind of see it in the formulae, though.
For successful search, linear probing had a \( \frac{1}{ 1 − \lambda } \) term, but quadratic probing has a \( -\ln{ ( 1 - \lambda ) } = \ln{ \left( \frac{1}{ 1 - \lambda} \right) } \) term, which is smaller.
For unsuccessful search, linear probing had a \( \frac{1}{ 1 - \lambda } \) term, but quadratic probing has a \( -\ln{ ( 1 - \lambda ) } = \ln{ \left( \frac{1}{ 1 - \lambda } \right) } \) term, which is better.
We'll plot these later so you can see the improvement.
Clustering?
Quadratic probing does a pretty good job of reducing primary clustering by “jumping” from one region of the table to another.
But it still experiences what we call secondary clustering, which is not as bad.
See, the way you get a long lookup time with quadratic probing is having a bunch of keys on the same probing sequence.
Luckily, that can only happen to keys whose hash values map to the same initial bucket. But that does happen!
Hmm. So how could we prevent this?
Is there a way for keys that end up at the same bucket to have different probing sequences?
Ooh, good idea! Let's explore it!
(When logged in, completion status appears here.)