Interpreting Hash Table Complexities
The complexity guarantees of hash tables are kind of subtle!
Understanding them requires understanding of all of the complexity ideas we've studied so far.
Let's practice!
Recall that for a dynamically resized separate-chaining hash table we have the following complexity guarantees:
- Lookup:
- Worst-case complexity \( \Theta(n) \)
- Happens if every key is in the same chain.
- Expected complexity \( \Theta(1) \)
- Our probabilistic assumptions are about the hash function and distribution of keys.
- Worst-case complexity \( \Theta(n) \)
- Insert:
- Worst-case complexity \( \Theta(n) \)
- Happens if every key is in the same chain, or the table needs to resize.
- Expected complexity \( \mathrm{O}(n) \)
- We're only making probabilistic assumptions about the hash function and distribution of keys. (We aren't making probabilistic assumptions about whether the table needs to resize, and that's what makes it \( \mathrm{O}(n) \) and not \( \Theta(1) \).)
- Amortized expected complexity \( \Theta(1) \)
- Worst-case complexity \( \Theta(n) \)
Consider writing this list down so you can have it next to you while you answer these questions!
Exercise 1
Consider the following function:
void addItem(HashColorSet& s, RGBcolor pigment) {
s.insert(pigment);
}
In the worst case, the insertion might take \( \Theta(n) \) time:
- We might end up searching a really long chain to check if the new key is already in the table.
- We might have to resize the table.
In the best case, it might take \( \Theta(1) \) time (if we are inserting into an empty chain and don't resize).
So, we can say that it takes no worse than linear time: \( \mathrm{O}(n) \).
But this is a bit of a pessimistic characterization…
Now, assuming ideal hash-function behavior (uniform distribution of keys), consider the expected complexity.
If the table needs to resize, then the expected complexity of insert is \( \Theta(n) \), because, no matter what, it takes \( \Theta(n) \) time to transfer all the items to the new table
If the table doesn't need to resize, then the expected complexity is \( \Theta(1) \).
So, we can say that the expected complexity is no more than linear: \( \mathrm{O}(n) \).
Exercise 2a
Consider the following code snippet:
HashColorSet s;
double MAXVAL = n - 1;
for (size_t i = 0; i < n; ++i) {
s.insert(RGBcolor{i/MAXVAL, 0.0625, i/MAXVAL});
}
In the worst case, each insert takes linear time (e.g., they all end up in the same bucket). So, in the worst case, inserting \( n \) items will take \( \Theta( 1 + 2 + \cdots{} + n ) = \Theta( n^2 ) \) time.
In the best case, each insert takes amortized \( \Theta(1) \) time–each item is placed in an empty bucket and the table is resized occasionally. All told, that's \( n \times \Theta(1) = \Theta(n) \) time.
So, we can say that the complexity of this snippet is no worse than quadratic: \( \mathrm{O}(n^{2}) \).
How about the expected complexity?
Insert has \( \Theta(1) \) amortized expected complexity. Therefore, the expected complexity of \( n \) inserts is \( n \times \Theta(1) = \Theta(n) \).
Exercise 2b
Now say we continue the snippet with the following line:
bool hasNeonPink = s.lookup(RGBcolor{1.0, 0.0625, 1.0});
In the worst case (e.g., looking in a very long chain), lookup takes \( \Theta(n) \) time.
In the best case (e.g., looking in a very short chain), lookup takes \( \Theta(1) \) time.
So we can say that this line of code takes no more than linear time: \( \mathrm{O}(n) \).
How about the expected complexity?
Under our assumption of uniformly distributed keys, the expected complexity of lookup is \( \Theta(1) \).
Exercise 2c
Finally, say we continue with this snippet:
bool anyThere = false;
double MAXVAL = n-1;
for (size_t i = 1; i < n; ++i) {
if (s.lookup(RGBcolor{0.0, n/MAXVAL, 1.0})) {
// ^--- None of these keys are in the table
anyThere = true;
}
}
These are all unsuccessful lookups. In the worst case, all keys are in one chain and we have to search it over and over again.
In that case, each lookup takes \( \Theta(n) \) time, so the whole snippet takes \( n \times \Theta(n) \)=\( \Theta( n^{2} ) \) time.
In the best case we are always looking in an empty chain (\( \Theta(1) \) time). In that case, the snippet takes \( n \times \Theta(1) = \Theta(n) \) time.
How about the expected complexity?
Again, assuming an ideal hash function, each lookup has an expected complexity of \( \Theta(1) \).
Therefore, the expected complexity of \( n \) lookups is \( n \times \Theta(1) = \Theta(n) \).
Reminder
Worst-case complexities got a lot of attention on this page.
But remember that with a good hash function, worst-case lookup behavior is vanishingly rare.
So, in practice, you can use hash tables without worrying too much that worst-case lookup will happen.
We only talked about it so much on this page because we wanted you to practice reasoning about complexities!
(When logged in, completion status appears here.)