CS 70

Dynamic Table Size

  • LHS Cow speaking

    So what have we learned?

  • Hedgehog speaking

    We've learned that hash tables are terrible!

  • LHS Cow speaking

    Yes, we– Wait, what? Is that what we learned?

  • Hedgehog speaking

    I think so… Expected lookup time is \( \Theta(\lambda) \) either way.

  • LHS Cow speaking

    So…?

  • Hedgehog speaking

    So, \( \lambda = \frac{N}{M} \), right? And that means that \( \Theta(\lambda) = \Theta(N) \)! Terrible!

  • LHS Cow speaking

    Ah, I see. Good observation!

  • RHS Cow speaking

    But \( \Theta(\lambda) = \Theta(N) \) is only true if \( M \) stays constant….

  • LHS Cow speaking

    What if we're allowed to resize our table?

We've seen that the main contributor to the expected complexity of lookup is the load factor, \( \lambda \).

So if we want to keep lookup times low, we need to keep \( \lambda \) low!

Thus, as keys are inserted into the table (increasing \( N \)), we also need to add buckets to the table (increasing \( M \)) to keep the ratio \( \frac{N}{M} \) small.

How to Resize a Hash Table

  • Allocate a new array with the required number of buckets.
  • For each key in the original table, use the hash function to find its new bucket in the new table and INSERT it there.
    • The new bucket is probably not the same as the old one!
  • Delete the old array.

With separate chaining, we can do the resize in \( \Theta(n) \) time.

If we keep resizing the table to keep \( \lambda \) below some constant value, then \( \Theta(\lambda) = \Theta(1) \)!

We've been down this road before with array-backed lists!

We can use the same capacity doubling strategy here.

Proposal

  • Set a maximum load factor \( \lambda_{\mathrm{max}} \).
  • Start with a table with one bucket.
  • Whenever \( \lambda = \lambda_{\mathrm{max}} \), double the size of the table.

We'll have occasional expensive resize operations, but this plan will give INSERT good amortized complexity. Of course, we were already dealing with expected complexities.

So INSERT has amortized expected \( \Theta(1) \) complexity.

Here, \( \Theta(1) \) is an amortization of the total expected complexity. It's not a statement about the worst-case complexity of INSERT (whether an individual INSERT or a sequence of them).

Specifically, we have

  • The expected complexity of a single insert is \( \mathrm{O}(n) \).

    • Sometimes the expected complexity is \( \Theta(1) \) (no resize).
    • Sometimes the expected complexity is \( \Theta(n) \) (resize).
  • But the expected complexity of \( m\) inserts is \( m \times \Theta(1) = \Theta(m) \).

    • That's because the amortized expected complexity is \( \Theta(1) \).
  • The worst case for a single insert is still \( \Theta(n) \).

    • And the worst case for \( m \) inserts is still \( \Theta(m^2) \).

If you feel okay about the idea of amortized expected complexity, then you can skip the following video.

If it still feels strange or confusing, this video might help to unpack things some more!

Hash Table Complexities

After all this work, we have the final complexities for a dynamically sized hash table with separate chaining and an ideal hash function:

  • LOOKUP: expected \( \Theta(1) \) (worst-case \( \Theta(n) \)).
  • INSERT: amortized expected \( \Theta(1) \) (worst-case \( \Theta(n) \)).
  • Dog speaking

    Wooooooo! We did it! Constant time INSERT and LOOKUP! Take that, binary-search trees!

  • LHS Cow speaking

    No, no. No. That is not the right takeaway.

  • RHS Cow speaking

    Those complexities do not give us constant time INSERT and LOOKUP! The fine print matters!

  • LHS Cow speaking

    First, we're talking about expected complexities.

  • RHS Cow speaking

    If the assumptions we made hold (i.e., our hash function uniformly distributes keys across buckets and our keys are randomly distributed), the theoretical worst-case performance is vanishingly unlikely…

  • LHS Cow speaking

    But if those assumptions don't hold (i.e., we have a bad hash function or a carefully contrived set of keys that hash to the same bucket), the worst case could occur in practice.

  • RHS Cow speaking

    Second, when we have a hash table that uses dynamic resizing to bound the load factor, INSERT has amortized expected complexity.

  • LHS Cow speaking

    Sometimes an insert will require \( \Theta(n) \) time. Our amortization argument shows that these expensive operations occur rarely and don't exceed a constant time budget per operation.

  • Cat speaking

    So when we consider total time, it's as if each operation requires expected constant time. But I might not just care about total execution time.

  • LHS Cow speaking

    Exactly!

  • Goat speaking

    So if I wrote a game that kept growing a hash table, it might stutter each time the hash table resized?

  • Duck speaking

    And if I wrote my own hash function, and it wasn't very good, my hash-table performance might be terrible?

  • LHS Cow speaking

    Exactly. And these are the kinds of things it's good to be aware of.

  • RHS Cow speaking

    Hash tables are cool even with those caveats, and as a result they're incredibly widely used.

  • LHS Cow speaking

    We don't need to oversell them by overlooking the truths of their performance. Being aware of the pros and cons helps us make an informed choice. Try again, Dog!

  • Dog speaking

    Wooooooo! Hash tables make a different set of trade-offs than binary-search trees and might be a better choice depending on the needs of your problem!

  • LHS Cow speaking

    Much better.

Do hash tables guarantee constant time INSERT and LOOKUP?

(When logged in, completion status appears here.)