CS 70

Splay-Tree Complexity

  • Duck speaking

    So, what does all this say about splay trees?

  • LHS Cow speaking

    Great question! Now that we know what amortized complexity is, let's revisit splay trees!

Remember what we saw on Wikipedia: splay trees have amortized \( \mathrm{O}( \log{n} ) \) complexity for both insert and lookup.

We also know that an amortized bound promises that the total time of all the insert/lookup operations performed up until now is bounded by the sum of the amortized times of the individual operations.

Example 1

Imagine that we start with an empty splay tree and then insert \( m \) keys.

What is the total asymptotic complexity of all of the insert operations?

To find out, we just need to add up the amortized bounds! When we are considering total execution time, the amortized analysis allows us to pretend that the complexity of each insert is \( \mathrm{O}( \log{n} )\) for \( n = 0, 1, 2, 3, \ldots, ( m - 1 ) \). (For logarithm sticklers out there, we are going to consider the complexity of inserting when \( n = 0 \) and when \( n = 1 \) to be \( \mathrm{O}(1) \).)

So that gives us \( \mathrm{O}( 1 + 1 + \log{2} + \log{2} + \log{3} + \log{4} + \cdots{} + \log{ ( m - 1 ) } ) \). We can simplify it as

  • Every term in the sum is bounded above by \( \log{ ( m - 1 ) } \).
  • So the complexity is \( \mathrm{O}( \log{ ( m - 1 ) } + \log{ ( m - 1 ) } + \cdots{} + \log{ ( m - 1 ) } ) = \mathrm{O}( m \log{ ( m-1 ) } ) \).
  • In the limit, the \( -1 \) won't matter.

So we conclude that the total complexity of \( m \) inserts, starting from an empty tree, is \( \mathrm{O}( m \log{m} ) \).

  • Koala speaking

    Bewdy bottler, mate! Even though the tree can get imbalanced, it all evens out in the end!

  • LHS Cow speaking

    That's right! We know that some of these individual inserts might take linear time.

  • RHS Cow speaking

    But when you consider all the inserts as a whole, it takes \( \mathrm{O}( m \log{m} ) \) time, same as a red–black tree.

  • LHS Cow speaking

    Better even, because a red–black tree takes \( \Theta( m \log{m} ) \) time! It's possible for splay tree inserts to take less time because the tree might be imbalanced.

Example 2

Okay, so now we have a tree with \( m \) keys in it.

Say we now perform \( m \) lookups.

Which of these is the most accurate and precise characterization of the total complexity of performing all m lookups?

Here the amortized bound allows us to pretend that each lookup takes \( \mathrm{O}( \log{n} ) \), where \( n = m \) the entire time (because lookup doesn't change the size of the tree).

So the total complexity is \( \mathrm{O}( \log{m} + \log{m} + \log{m} + \cdots{} + \log{m}) = \mathrm{O}( m \log{m} ) \).

  • RHS Cow speaking

    Once again, the actual times of individual lookups will probably vary.

  • LHS Cow speaking

    But it all evens out to \( \mathrm{O}(m\log{m}) \).

  • RHS Cow speaking

    The amortized bound makes it easy to calculate total complexities when it's not as straightforward as adding up worst-case bounds.

  • Horse speaking

    Hay! Amortized time is about total execution time, but you began with a tree that had \( m \) keys in it. What if they were a stick?

  • LHS Cow speaking

    Well, if you want to be a bit more formal about it, we can consider that we got that tree with \( m \) keys by inserting them one at a time, and so actually there were \( 2 m \) operations overall (the inserts and the lookups). That way everything still works out.

  • RHS Cow speaking

    But if you imagine that we just conjured up a tree with \( m \) keys in it all arranged as a stick, then we couldn't use the amortized bound because we wouldn't be properly “starting at the beginning” and measuring total time.

Comparisons

  • Dog speaking

    So? Which tree is best?

  • LHS Cow speaking

    There isn't a best tree.

  • RHS Cow speaking

    They each have strengths and weaknesses!

  Simple BST Splay Tree Red–Black Tree
Best case insert in arbitrary tree of size \( n \) Child of root: \( \Theta(1) \) Child of root: \( \Theta(1) \) Balanced: \( \Theta( \log{n} ) \)
Worst case insert in arbitrary tree of size \( n \) Bottom of stick: \( \Theta(n) \) Bottom of stick: \( \Theta(n) \) Balanced: \( \Theta( \log{n} ) \)
Best case lookup in arbitrary tree of size \( n \)  Key in root: \( \Theta(1) \) Key in root: \( \Theta(1) \) Key in root: \( \Theta(1) \)
Worst case lookup in arbitrary tree of size \( n \) Bottom of stick: \( \Theta(n) \) Bottom of stick: \( \Theta(n) \) Balanced: \( \Theta( \log{n} ) \)
Best case of \( m \) inserts into empty tree Balanced: \( \Theta( m \log{m} ) \) Sorted order: \( \Theta(m) \) Balanced: \( \Theta( m \log{m} ) \)
Worst case of \( m \) inserts into empty tree Sorted order: \( \Theta(m^2) \) Adversarial: \( \Theta( m \log{m} ) \) Balanced: \( \Theta(m \log{m} ) \)
Best case of \( m \) lookups in tree of size \( m \) In root: \( \Theta(m) \) Repeated lookup: \( \Theta(m) \) In root: \( \Theta(m) \)
Worst case of \( m \) lookups in tree of size \( m \) Bottom of stick: \( \Theta(m^2) \) Adversarial: \( \Theta( m \log{m} ) \) In leaves: \( \Theta( m \log{m} ) \)
  • Cat speaking

    So, it seems like simple BSTs are just unreliable.

  • LHS Cow speaking

    Yes, I think that's fair. Basically nobody would use a BST with no balancing mechanism.

  • Cat speaking

    And it seems like red–black trees are like old faithful.

  • LHS Cow speaking

    Yes, you can basically count on \( \log \)-time every time.

  • Cat speaking

    And splay trees are sort of up and down. They have better best cases and worse worst cases.

  • LHS Cow speaking

    Yes, that's true! Splay trees will be prone to occasional complexity spikes, where individual operations take a long time.

  • RHS Cow speaking

    But in the total complexity it all evens out to linearithmic worst case, just like red–black trees.

In addition to complexity there are other comparisons to make:

  • Memory overhead: Red–black trees need to store extra information in each node (color). Splay trees don't.
  • Time overhead: Red–black trees only rotate some nodes on the path and only for insertion. Splay trees rotate all the way to root during both insertion and lookup.
  • Complicatedness: Red–black trees have a reputation for being complicated to implement. Splay trees are considerably simpler.
  • Locality: Red–black trees do poorly when there is temporal locality. Splay trees do best when there is temporal locality.
  • LHS Cow speaking

    So at the end of the day, we have to consider the trade-offs of the data structures we choose.

  • RHS Cow speaking

    There are lots of data structures, because there are lots of different problems!

  • Rabbit speaking

    FWIW, one reason red–black trees have a such bad reputation for being complicated is that some books on algorithms (sadly) present them in a confusing and overcomplicated way. It actually is possible to code them elegantly and simply (e.g., Sedgewick's Left Leaning Red–Black Trees [PDF]), but few people ever see that example, so generations of students are taught to fear them.

(When logged in, completion status appears here.)