Complexity of 2–3–4 Trees
Since every 2–3–4 Tree is perfectly balanced, I submit that the worst-case cost of insert is \( \Theta(\log n) \)! Sherlock does it again!
I see what you mean, Sherlock. Surely the distance from the root to a leaf is small…
Precisely! The tree is perfectly balanced, so it has logarithmic height.
…but don't we have to do more work? Now nodes have multiple keys and we might have to move keys around in the tree!
Ahh, indeed, Watson. But how much more work?
Before, we had to compare with one value at every node to know which way to go down the tree. Now we might need to compare with three values! So lookup may take more effort.
That said, both 1 and 3 are constants! Even if we do three times as much work at every node, \( 3 \log{n} \in \Theta( \log{n} ) \). So 2–3–4 Trees may do more operations per node, but that doesn't change their performance in an asymptotic sense.
The same goes for the work of changing a 2-node to a 3-node or a 3-node to a 4-node. It might be a fair bit of work, but it's the same fair bit of work no matter how big the tree is. That makes it a constant!
If you imagine a 2–3–4 tree that's completely full of 4-nodes, then we'd have \( n / 3 \) nodes, arranged like
Since each node has four children, each step down the tree reduces the size of the subtree to \( \frac{1}{4} \) of its previous size.
So, worst case, we'd have to look at \( \log_{4}{(n + 1)} \) of the nodes; in this case, \( \log_{4}{(64)} = 3 \) of them.
For each node we look at, we'd have to look at more than three values.
All together, that gives us \( 3 \times \log_{4}{( n + 1)} \) comparisons to make in a lookup.
Hay! You just assumed the worst case was when the tree was full of 4-nodes!
Yes, we did say that. If we do a similar argument for when it's full of 3-nodes, we get \( 2 \log_{3}{(n + 1)} \) comparisons, and for when it's full of 2-nodes, we get \( \log_{2}{(n + 1)} \) comparisons. These both end up being less than \( 3 \log_{4}{(n + 1)} \).
What if it was a mix of nodes?
It is obvious even to the most casual observer that the worst case is when the tree is full of 4-nodes. (And if it isn't obvious, you aren't being casual enough!!)
Actually, Duck is right! The true worst case is when there is a single path in the tree (from root to leaf) that is all 4-nodes, and all the other nodes are 2-nodes. But that would be more annoying to analyze, and it only changes the formula by a factor of 4/3, so the asymptotic complexity is the same. If you take CS 140 (Algorithms), you'll cover these trees in more depth and do a more detailed analysis.
Our main purpose was to convince you that the worst case is logarithmic, and we've done that well enough for CS 70.
Meh. Weak sauce. You should have done the work, and I'd have enjoyed ignoring it.
Summary of 2–3–4 Tree Complexity
- The best-case cost of lookup in a 2–3–4 tree is \( \Theta(1) \).
- The worst-case cost of lookup in a 2–3–4 tree is \( \Theta( \log{n} ) \).
- Therefore, lookup in a 2–3–4 tree has \( \mathrm{O}( \log{n} ) \) complexity.
- The complexity of insert is also \( \mathrm{O}( \log{n} ) \).
- It involves
- Doing a lookup (which might return early if it finds a key already in the tree).
- Actually inserting the key at the leaf takes constant time.
(When logged in, completion status appears here.)