CS 70

Contrasting BST Approaches

At this point, we've seen three different approaches for implementing “sufficiently balanced” binary-search trees:

  • Red–black trees
  • Splay trees
  • Randomized binary-search trees
  • Goat speaking

    Why didn't you just teach us the best one?

  • LHS Cow speaking

    Because, as usual, they all fall in different places in a space of trade-offs.

  • RHS Cow speaking

    Let's compare them in a table.

Comparing Approaches

Randomized BST Red–Black Trees Splay Trees
Overall Complexity Expected \(\Theta(\log n)\) Worst-case \(\Theta(\log n)\) Amortized \(\Theta(\log n)\)
Height/Balance Expected \(\Theta(\log n)\). Tree will be look untidy, but be “sufficiently balanced”. Worst-case \(\Theta(\log n)\). Tree is very well balanced. \(\textrm{O}(n)\). Can make a stick, but that can't be exploited as operations tend to balance the tree.
Lookup Best Case \(\textrm{O}(1)\) \(\textrm{O}(1)\) \(\textrm{O}(1)\)
Insert/Delete Best Case \(\textrm{O}(1)\) \(\Theta(\log n)\) \(\textrm{O}(1)\)
Worst Case for an Operation \(\Theta(n)\) but implausible \(\Theta(\log n)\) \(\Theta(n)\) and easily triggered
Worst Case for \(m\) Operations \(\Theta(m^2)\) but implausible \(\Theta(m \log m)\) \(\Theta(m \log m)\)
In-Order Insertion of \(n\) Items \(\Theta(n \log n)\) expected time, but better constant than usual! \(\Theta(n \log n)\) as always \(\Theta(n)\) with balancing deferred until later
Locality No locality promises No locality promises Recently inserted/found keys are cheap to look up
Extra space Store subtree sizes (integer value) Store node color (boolean value) None!
Extra time Rotations during insert/delete Rotations during insert/delete Rotations during insert/delete and lookup
Implemention Difficulty Pretty simple Often considered complex. Medium

Color-blind key: better, probably fine, not so good

Let's see if we can highlight some differences.

Randomized BSTs

  • Hedgehog speaking

    I like that randomized BSTs are easy to implement.

  • Duck speaking

    They won't make theoreticians happy—they'd worry about the worst case.

  • LHS Cow speaking

    But the worst case only occurs if they build a pathologically unbalanced tree, which would be profoundly unlikely.

Red–Black Trees

  • Koala speaking

    Red–black trees' main advantage is the worst-case guarantee.

  • LHS Cow speaking

    Mm-hmm. If you really can't afford to have any operation take a long time, red–black trees are the way to go!

  • Goat speaking

    But the extreme consistency of red–black trees also means they're boring. In some scenarios, the other techniques can beat them.

  • Hedgehog speaking

    And they're supposed to be complicated to implement.

  • LHS Cow speaking

    There are certainly a ton of overcomplicated and confusing implementations of red–black trees on the internet.

  • RHS Cow speaking

    But there are some implementations that are straightforward, they're just harder to find.

  • Bjarne speaking

    FWIW, most C++ standard libraries use red–black trees for std::set and std::map.

Splay Trees

  • Koala speaking

    Splay trees are good if you look up a few keys lots of times.

  • Cat speaking

    Or if you insert things in order (and only access the most recently inserted thing).

  • LHS Cow speaking

    Yes! Also, splay trees don't do any extra bookkeeping, so they don't add any space overhead.

Overall

  • Koala speaking

    So there are different kinds of search trees for different kinds of problems!

  • LHS Cow speaking

    That's it exactly. We are always going to make some trade-offs. Which trade-offs are worthwhile depends on the problem!

(When logged in, completion status appears here.)