CS 70

Collision Resolution: Quadratic Probing

  • LHS Cow speaking

    We saw that the main problem with linear probing is clustering.

  • RHS Cow speaking

    It tends to create large regions of filled buckets that just keep getting larger and larger.

  • LHS Cow speaking

    That can lead to large (nearly worst-case) lookup times.

  • RHS Cow speaking

    Now we'll talk about some probing strategies that are less susceptible to clustering.

One of the problems we noticed about primary clustering is that inserting any key that hashes into the cluster makes the cluster bigger. Wouldn't it be great if we could somehow “jump out” of the cluster and put the key somewhere else entirely?

The main idea of quadratic hashing is to “jump” farther and farther away with our probing.

  • Specifically, if our key hashes to bucket \( H \), then we probe \( H, H + 1,\) \( H + 4, H + 9, \) \( H + 16, H + 16,\) \( H + 25, \ldots{}. \)
  • Generally, if we are on our \( i \)th probe, then we probe bucket \( H + ( i - 1 )^{2} \).
  • If our probe goes out of bounds, wrap around to the beginning of the table.

Example

Imagine that we start with a table with 20 buckets. Our keys are integers and our hash function is really simple: key % 20.

Draw the table on a piece of paper, then insert the following keys in this order:

23, 60, 77, 34, 0, 39, 3, 9, 87, 25, 24, 1, 21

List the contents of the table; put a single space in between each key and use underscore (_) for empty buckets.

The video below shows the process of inserting these keys into the table…

We can see that in this example our big cluster was broken up and the key 21 can be looked up with far fewer probes.

Table Size Problems

Starting with a table with 16 buckets and the simple hash function k % 16, draw the table on a piece of paper, then insert the following keys in this order:

0, 16, 32, 48, 64

What's the problem?

All of these keys hash to the same bucket (bucket 0). The problem is that our probes only visit a tiny part of the table: 0, 1, 4, 9, then wrap back around to 0, 1, 4, 9, and so on.

Even though we have room for the key 64, we never find a place to put it!

  • Hedgehog speaking

    Oh no, that's no good at all!

  • LHS Cow speaking

    But all hope is not lost!

  • RHS Cow speaking

    We should probably ask the number theorists what they think.

  • Owl speaking

    Theorem: If the hash table size is prime, then the probe sequence visits at least half of the buckets before repeating.

  • LHS Cow speaking

    Thanks! We're not going to prove that, but rest assured, we trust our number theorists and it is true.

Thus our solution to this problem is

  • Give our table a prime number as its size.
    • When resizing, pick a prime number at least twice as big as the current size.
    • Often we just keep a look-up table of prime numbers for this purpose.
  • Resize often enough to keep \( \lambda \lt 0.5 \).
    • That way the probing sequence will always find an empty spot eventually.

Expected Complexity

Once again, the analysis is more complicated than we want to get into.

But here are the results!

Expected number of probes for successful lookup (key in the table):

\[ \mathrm{E}[ C_{ \mathrm{SL} } ] = 1 - \ln{ ( 1 - \lambda ) } - \frac{ \lambda }{2} \]

Expected number of probes for unsuccessful lookup (key not in the table):

\[ \mathrm{E}[ C_{ \mathrm{UL} } ] = \frac{1}{ 1 - \lambda } - \lambda - \ln{ ( 1 - \lambda ) } \]

  • Duck speaking

    Uh. Is that better or worse?

  • LHS Cow speaking

    Yeah, it's hard to tell at first glance! You can kind of see it in the formulae, though.

  • RHS Cow speaking

    For successful search, linear probing had a \( \frac{1}{ 1 − \lambda } \) term, but quadratic probing has a \( -\ln{ ( 1 - \lambda ) } = \ln{ \left( \frac{1}{ 1 - \lambda} \right) } \) term, which is smaller.

  • LHS Cow speaking

    For unsuccessful search, linear probing had a \( \frac{1}{ 1 - \lambda } \) term, but quadratic probing has a \( -\ln{ ( 1 - \lambda ) } = \ln{ \left( \frac{1}{ 1 - \lambda } \right) } \) term, which is better.

  • RHS Cow speaking

    We'll plot these later so you can see the improvement.

Clustering?

Quadratic probing does a pretty good job of reducing primary clustering by “jumping” from one region of the table to another.

But it still experiences what we call secondary clustering, which is not as bad.

See, the way you get a long lookup time with quadratic probing is having a bunch of keys on the same probing sequence.

Luckily, that can only happen to keys whose hash values map to the same initial bucket. But that does happen!

  • LHS Cow speaking

    Hmm. So how could we prevent this?

  • Cat speaking

    Is there a way for keys that end up at the same bucket to have different probing sequences?

  • LHS Cow speaking

    Ooh, good idea! Let's explore it!

(When logged in, completion status appears here.)