Contrasting BST Approaches
At this point, we've seen three different approaches for implementing “sufficiently balanced” binary-search trees:
- Red–black trees
- Splay trees
- Randomized binary-search trees
Why didn't you just teach us the best one?
Because, as usual, they all fall in different places in a space of trade-offs.
Let's compare them in a table.
Comparing Approaches
Randomized BST | Red–Black Trees | Splay Trees | |
Overall Complexity | Expected \(\Theta(\log n)\) | Worst-case \(\Theta(\log n)\) | Amortized \(\Theta(\log n)\) |
Height/Balance | Expected \(\Theta(\log n)\). Tree will be look untidy, but be “sufficiently balanced”. | Worst-case \(\Theta(\log n)\). Tree is very well balanced. | \(\textrm{O}(n)\). Can make a stick, but that can't be exploited as operations tend to balance the tree. |
Lookup Best Case | \(\textrm{O}(1)\) | \(\textrm{O}(1)\) | \(\textrm{O}(1)\) |
Insert/Delete Best Case | \(\textrm{O}(1)\) | \(\Theta(\log n)\) | \(\textrm{O}(1)\) |
Worst Case for an Operation | \(\Theta(n)\) but implausible | \(\Theta(\log n)\) | \(\Theta(n)\) and easily triggered |
Worst Case for \(m\) Operations | \(\Theta(m^2)\) but implausible | \(\Theta(m \log m)\) | \(\Theta(m \log m)\) |
In-Order Insertion of \(n\) Items | \(\Theta(n \log n)\) expected time, but better constant than usual! | \(\Theta(n \log n)\) as always | \(\Theta(n)\) with balancing deferred until later |
Locality | No locality promises | No locality promises | Recently inserted/found keys are cheap to look up |
Extra space | Store subtree sizes (integer value) | Store node color (boolean value) | None! |
Extra time | Rotations during insert/delete | Rotations during insert/delete | Rotations during insert/delete and lookup |
Implemention Difficulty | Pretty simple | Often considered complex. | Medium |
Color-blind key: better, probably fine, not so good
Let's see if we can highlight some differences.
Randomized BSTs
I like that randomized BSTs are easy to implement.
They won't make theoreticians happy—they'd worry about the worst case.
But the worst case only occurs if they build a pathologically unbalanced tree, which would be profoundly unlikely.
Red–Black Trees
Red–black trees' main advantage is the worst-case guarantee.
Mm-hmm. If you really can't afford to have any operation take a long time, red–black trees are the way to go!
But the extreme consistency of red–black trees also means they're boring. In some scenarios, the other techniques can beat them.
And they're supposed to be complicated to implement.
There are certainly a ton of overcomplicated and confusing implementations of red–black trees on the internet.
But there are some implementations that are straightforward, they're just harder to find.
FWIW, most C++ standard libraries use red–black trees for
std::set
andstd::map
.
Splay Trees
Splay trees are good if you look up a few keys lots of times.
Or if you insert things in order (and only access the most recently inserted thing).
Yes! Also, splay trees don't do any extra bookkeeping, so they don't add any space overhead.
Overall
So there are different kinds of search trees for different kinds of problems!
That's it exactly. We are always going to make some trade-offs. Which trade-offs are worthwhile depends on the problem!
(When logged in, completion status appears here.)