CS 70

Complexity of 2–3–4 Trees

  • Sherlock speaking

    Since every 2–3–4 Tree is perfectly balanced, I submit that the worst-case cost of insert is \( \Theta(\log n) \)! Sherlock does it again!

  • Watson speaking

    I see what you mean, Sherlock. Surely the distance from the root to a leaf is small…

  • Sherlock speaking

    Precisely! The tree is perfectly balanced, so it has logarithmic height.

  • Watson speaking

    …but don't we have to do more work? Now nodes have multiple keys and we might have to move keys around in the tree!

  • Sherlock speaking

    Ahh, indeed, Watson. But how much more work?

Before, we had to compare with one value at every node to know which way to go down the tree. Now we might need to compare with three values! So lookup may take more effort.

That said, both 1 and 3 are constants! Even if we do three times as much work at every node, \( 3 \log{n} \in \Theta( \log{n} ) \). So 2–3–4 Trees may do more operations per node, but that doesn't change their performance in an asymptotic sense.

The same goes for the work of changing a 2-node to a 3-node or a 3-node to a 4-node. It might be a fair bit of work, but it's the same fair bit of work no matter how big the tree is. That makes it a constant!

Best case: What's the smallest number of comparisons we might have to make for a lookup in a 2–3–4 tree?

Worst Case: Let's suppose that the worst case for a 2–3–4 tree is when all the nodes end up being 4-nodes.

Would we ever have to make more than \( 3 \times \log_{4}{(n+1)} \) comparisons for a lookup in a 2–3–4 Tree?

If you imagine a 2–3–4 tree that's completely full of 4-nodes, then we'd have \( n / 3 \) nodes, arranged like

Since each node has four children, each step down the tree reduces the size of the subtree to \( \frac{1}{4} \) of its previous size.

So, worst case, we'd have to look at \( \log_{4}{(n + 1)} \) of the nodes; in this case, \( \log_{4}{(64)} = 3 \) of them.

For each node we look at, we'd have to look at more than three values.

All together, that gives us \( 3 \times \log_{4}{( n + 1)} \) comparisons to make in a lookup.

  • Horse speaking

    Hay! You just assumed the worst case was when the tree was full of 4-nodes!

  • LHS Cow speaking

    Yes, we did say that. If we do a similar argument for when it's full of 3-nodes, we get \( 2 \log_{3}{(n + 1)} \) comparisons, and for when it's full of 2-nodes, we get \( \log_{2}{(n + 1)} \) comparisons. These both end up being less than \( 3 \log_{4}{(n + 1)} \).

  • Duck speaking

    What if it was a mix of nodes?

  • Alien speaking

    It is obvious even to the most casual observer that the worst case is when the tree is full of 4-nodes. (And if it isn't obvious, you aren't being casual enough!!)

  • LHS Cow speaking

    Actually, Duck is right! The true worst case is when there is a single path in the tree (from root to leaf) that is all 4-nodes, and all the other nodes are 2-nodes. But that would be more annoying to analyze, and it only changes the formula by a factor of 4/3, so the asymptotic complexity is the same. If you take CS 140 (Algorithms), you'll cover these trees in more depth and do a more detailed analysis.

  • RHS Cow speaking

    Our main purpose was to convince you that the worst case is logarithmic, and we've done that well enough for CS 70.

  • Goat speaking

    Meh. Weak sauce. You should have done the work, and I'd have enjoyed ignoring it.

Summary of 2–3–4 Tree Complexity

  • The best-case cost of lookup in a 2–3–4 tree is \( \Theta(1) \).
  • The worst-case cost of lookup in a 2–3–4 tree is \( \Theta( \log{n} ) \).
  • Therefore, lookup in a 2–3–4 tree has \( \mathrm{O}( \log{n} ) \) complexity.
  • The complexity of insert is also \( \mathrm{O}( \log{n} ) \).
  • It involves
    1. Doing a lookup (which might return early if it finds a key already in the tree).
    2. Actually inserting the key at the leaf takes constant time.

(When logged in, completion status appears here.)