CS 70

Interpreting Hash Table Complexities

  • LHS Cow speaking

    The complexity guarantees of hash tables are kind of subtle!

  • RHS Cow speaking

    Understanding them requires understanding of all of the complexity ideas we've studied so far.

  • LHS Cow speaking

    Let's practice!

Recall that for a dynamically resized separate-chaining hash table we have the following complexity guarantees:

  • Lookup:
    • Worst-case complexity \( \Theta(n) \)
      • Happens if every key is in the same chain.
    • Expected complexity \( \Theta(1) \)
      • Our probabilistic assumptions are about the hash function and distribution of keys.
  • Insert:
    • Worst-case complexity \( \Theta(n) \)
      • Happens if every key is in the same chain, or the table needs to resize.
    • Expected complexity \( \mathrm{O}(n) \)
      • We're only making probabilistic assumptions about the hash function and distribution of keys. (We aren't making probabilistic assumptions about whether the table needs to resize, and that's what makes it \( \mathrm{O}(n) \) and not \( \Theta(1) \).)
    • Amortized expected complexity \( \Theta(1) \)
  • RHS Cow speaking

    Consider writing this list down so you can have it next to you while you answer these questions!

Exercise 1

Consider the following function:

void addItem(HashColorSet& s, RGBcolor pigment) {
    s.insert(pigment);
}

Which of the following captures the full range of outcomes for the complexity of addItem()?

In the worst case, the insertion might take \( \Theta(n) \) time:

  • We might end up searching a really long chain to check if the new key is already in the table.
  • We might have to resize the table.

In the best case, it might take \( \Theta(1) \) time (if we are inserting into an empty chain and don't resize).

So, we can say that it takes no worse than linear time: \( \mathrm{O}(n) \).

But this is a bit of a pessimistic characterization…

Now, assuming ideal hash-function behavior (uniform distribution of keys), consider the expected complexity.

Which of the following encompasses expected complexity of addItem() overall?

If the table needs to resize, then the expected complexity of insert is \( \Theta(n) \), because, no matter what, it takes \( \Theta(n) \) time to transfer all the items to the new table

If the table doesn't need to resize, then the expected complexity is \( \Theta(1) \).

So, we can say that the expected complexity is no more than linear: \( \mathrm{O}(n) \).

Exercise 2a

Consider the following code snippet:

HashColorSet s;
double MAXVAL = n - 1;
for (size_t i = 0; i < n; ++i) {
    s.insert(RGBcolor{i/MAXVAL, 0.0625, i/MAXVAL});
}

Which of the following is the most accurate and precise statement about the overall complexity of this snippet?

In the worst case, each insert takes linear time (e.g., they all end up in the same bucket). So, in the worst case, inserting \( n \) items will take \( \Theta( 1 + 2 + \cdots{} + n ) = \Theta( n^2 ) \) time.

In the best case, each insert takes amortized \( \Theta(1) \) time–each item is placed in an empty bucket and the table is resized occasionally. All told, that's \( n \times \Theta(1) = \Theta(n) \) time.

So, we can say that the complexity of this snippet is no worse than quadratic: \( \mathrm{O}(n^{2}) \).

How about the expected complexity?

Which of the following is the most accurate and precise statement about the expected complexity of this snippet?

Insert has \( \Theta(1) \) amortized expected complexity. Therefore, the expected complexity of \( n \) inserts is \( n \times \Theta(1) = \Theta(n) \).

Exercise 2b

Now say we continue the snippet with the following line:

bool hasNeonPink = s.lookup(RGBcolor{1.0, 0.0625, 1.0});

Which of the following is the most accurate and precise statement about the overall complexity of this line of code?

In the worst case (e.g., looking in a very long chain), lookup takes \( \Theta(n) \) time.

In the best case (e.g., looking in a very short chain), lookup takes \( \Theta(1) \) time.

So we can say that this line of code takes no more than linear time: \( \mathrm{O}(n) \).

How about the expected complexity?

Which of the following is the most accurate and precise statement about the expected complexity of this line of code?

Under our assumption of uniformly distributed keys, the expected complexity of lookup is \( \Theta(1) \).

Exercise 2c

Finally, say we continue with this snippet:

bool anyThere = false;
double MAXVAL = n-1;
for (size_t i = 1; i < n; ++i) {
    if (s.lookup(RGBcolor{0.0, n/MAXVAL, 1.0})) {
              // ^--- None of these keys are in the table
        anyThere = true;
    }
}

Which of the following is the most accurate and precise statement about the overall complexity of this snippet?

These are all unsuccessful lookups. In the worst case, all keys are in one chain and we have to search it over and over again.

In that case, each lookup takes \( \Theta(n) \) time, so the whole snippet takes \( n \times \Theta(n) \)=\( \Theta( n^{2} ) \) time.

In the best case we are always looking in an empty chain (\( \Theta(1) \) time). In that case, the snippet takes \( n \times \Theta(1) = \Theta(n) \) time.

How about the expected complexity?

Which of the following is the most accurate and precise statement about the expected complexity of this snippet?

Again, assuming an ideal hash function, each lookup has an expected complexity of \( \Theta(1) \).

Therefore, the expected complexity of \( n \) lookups is \( n \times \Theta(1) = \Theta(n) \).

Reminder

  • LHS Cow speaking

    Worst-case complexities got a lot of attention on this page.

  • RHS Cow speaking

    But remember that with a good hash function, worst-case lookup behavior is vanishingly rare.

  • LHS Cow speaking

    So, in practice, you can use hash tables without worrying too much that worst-case lookup will happen.

  • RHS Cow speaking

    We only talked about it so much on this page because we wanted you to practice reasoning about complexities!

(When logged in, completion status appears here.)