Splay-Tree Complexity
So, what does all this say about splay trees?
Great question! Now that we know what amortized complexity is, let's revisit splay trees!
Remember what we saw on Wikipedia: splay trees have amortized \( \mathrm{O}( \log{n} ) \) complexity for both insert and lookup.
We also know that an amortized bound promises that the total time of all the insert/lookup operations performed up until now is bounded by the sum of the amortized times of the individual operations.
Example 1
Imagine that we start with an empty splay tree and then insert \( m \) keys.
What is the total asymptotic complexity of all of the insert operations?
To find out, we just need to add up the amortized bounds! When we are considering total execution time, the amortized analysis allows us to pretend that the complexity of each insert is \( \mathrm{O}( \log{n} )\) for \( n = 0, 1, 2, 3, \ldots, ( m - 1 ) \). (For logarithm sticklers out there, we are going to consider the complexity of inserting when \( n = 0 \) and when \( n = 1 \) to be \( \mathrm{O}(1) \).)
So that gives us \( \mathrm{O}( 1 + 1 + \log{2} + \log{2} + \log{3} + \log{4} + \cdots{} + \log{ ( m - 1 ) } ) \). We can simplify it as
- Every term in the sum is bounded above by \( \log{ ( m - 1 ) } \).
- So the complexity is \( \mathrm{O}( \log{ ( m - 1 ) } + \log{ ( m - 1 ) } + \cdots{} + \log{ ( m - 1 ) } ) = \mathrm{O}( m \log{ ( m-1 ) } ) \).
- In the limit, the \( -1 \) won't matter.
So we conclude that the total complexity of \( m \) inserts, starting from an empty tree, is \( \mathrm{O}( m \log{m} ) \).
Bewdy bottler, mate! Even though the tree can get imbalanced, it all evens out in the end!
That's right! We know that some of these individual inserts might take linear time.
But when you consider all the inserts as a whole, it takes \( \mathrm{O}( m \log{m} ) \) time, same as a red–black tree.
Better even, because a red–black tree takes \( \Theta( m \log{m} ) \) time! It's possible for splay tree inserts to take less time because the tree might be imbalanced.
Example 2
Okay, so now we have a tree with \( m \) keys in it.
Say we now perform \( m \) lookups.
Here the amortized bound allows us to pretend that each lookup takes \( \mathrm{O}( \log{n} ) \), where \( n = m \) the entire time (because lookup doesn't change the size of the tree).
So the total complexity is \( \mathrm{O}( \log{m} + \log{m} + \log{m} + \cdots{} + \log{m}) = \mathrm{O}( m \log{m} ) \).
Once again, the actual times of individual lookups will probably vary.
But it all evens out to \( \mathrm{O}(m\log{m}) \).
The amortized bound makes it easy to calculate total complexities when it's not as straightforward as adding up worst-case bounds.
Hay! Amortized time is about total execution time, but you began with a tree that had \( m \) keys in it. What if they were a stick?
Well, if you want to be a bit more formal about it, we can consider that we got that tree with \( m \) keys by inserting them one at a time, and so actually there were \( 2 m \) operations overall (the inserts and the lookups). That way everything still works out.
But if you imagine that we just conjured up a tree with \( m \) keys in it all arranged as a stick, then we couldn't use the amortized bound because we wouldn't be properly “starting at the beginning” and measuring total time.
Comparisons
So? Which tree is best?
There isn't a best tree.
They each have strengths and weaknesses!
Simple BST | Splay Tree | Red–Black Tree | |
---|---|---|---|
Best case insert in arbitrary tree of size \( n \) | Child of root: \( \Theta(1) \) | Child of root: \( \Theta(1) \) | Balanced: \( \Theta( \log{n} ) \) |
Worst case insert in arbitrary tree of size \( n \) | Bottom of stick: \( \Theta(n) \) | Bottom of stick: \( \Theta(n) \) | Balanced: \( \Theta( \log{n} ) \) |
Best case lookup in arbitrary tree of size \( n \) | Key in root: \( \Theta(1) \) | Key in root: \( \Theta(1) \) | Key in root: \( \Theta(1) \) |
Worst case lookup in arbitrary tree of size \( n \) | Bottom of stick: \( \Theta(n) \) | Bottom of stick: \( \Theta(n) \) | Balanced: \( \Theta( \log{n} ) \) |
Best case of \( m \) inserts into empty tree | Balanced: \( \Theta( m \log{m} ) \) | Sorted order: \( \Theta(m) \) | Balanced: \( \Theta( m \log{m} ) \) |
Worst case of \( m \) inserts into empty tree | Sorted order: \( \Theta(m^2) \) | Adversarial: \( \Theta( m \log{m} ) \) | Balanced: \( \Theta(m \log{m} ) \) |
Best case of \( m \) lookups in tree of size \( m \) | In root: \( \Theta(m) \) | Repeated lookup: \( \Theta(m) \) | In root: \( \Theta(m) \) |
Worst case of \( m \) lookups in tree of size \( m \) | Bottom of stick: \( \Theta(m^2) \) | Adversarial: \( \Theta( m \log{m} ) \) | In leaves: \( \Theta( m \log{m} ) \) |
So, it seems like simple BSTs are just unreliable.
Yes, I think that's fair. Basically nobody would use a BST with no balancing mechanism.
And it seems like red–black trees are like old faithful.
Yes, you can basically count on \( \log \)-time every time.
And splay trees are sort of up and down. They have better best cases and worse worst cases.
Yes, that's true! Splay trees will be prone to occasional complexity spikes, where individual operations take a long time.
But in the total complexity it all evens out to linearithmic worst case, just like red–black trees.
In addition to complexity there are other comparisons to make:
- Memory overhead: Red–black trees need to store extra information in each node (color). Splay trees don't.
- Time overhead: Red–black trees only rotate some nodes on the path and only for insertion. Splay trees rotate all the way to root during both insertion and lookup.
- Complicatedness: Red–black trees have a reputation for being complicated to implement. Splay trees are considerably simpler.
- Locality: Red–black trees do poorly when there is temporal locality. Splay trees do best when there is temporal locality.
So at the end of the day, we have to consider the trade-offs of the data structures we choose.
There are lots of data structures, because there are lots of different problems!
FWIW, one reason red–black trees have a such bad reputation for being complicated is that some books on algorithms (sadly) present them in a confusing and overcomplicated way. It actually is possible to code them elegantly and simply (e.g., Sedgewick's Left Leaning Red–Black Trees [PDF]), but few people ever see that example, so generations of students are taught to fear them.
(When logged in, completion status appears here.)