Before You Start
You've implemented a set interface with a binary-search tree. The main operations for a set are INSERT and LOOKUP; ideally we want both of these to be efficient.
For a red–black tree, the performance for INSERT and LOOKUP is
- \( \Theta( \log{n} ) \) in the worst case
- \( \Theta(1) \) in the best case
- \( \mathrm{O}(\log{n}) \) overall
Meh, \( \mathrm{O}(\log{n}) \) seems kinda clunky.
Yeah, when I ran
minispell
from the homework, even with a perfectly balanced tree, with 34,831 words the height was 15, so there would be 16 string comparisons made in the worst case. That's not a huge number, but it's not nothing.I want MORE speed!
Arrays are constant time to access an element, right? Couldn't we just use an array to store the words?
Using an Array-Backed List to Store a Set
Suppose that we instead implemented our set with an array-backed list (e.g., std::vector
). We could do the following:
- INSERT: Add the new item to the end of the list (i.e.,
push_back()
). - LOOKUP: Search for the item in the list.
Can you remind me what an array-backed list is?
Fundamentally, we're storing the data in an array, but the array may have some empty space at the end. We keep track of how much of the array is actually used, and we can add new items to the end of the array. If we run out of space, we can allocate a new array that's bigger (usually twice as big) and copy everything over. The array is stored on the heap, which allows us to choose how big it is at runtime and allocate a new one to replace it when we need to.
Adding to an array-backed list with push_back()
takes amortized \( \Theta(1) \) time because we might have to increase the capacity of the underlying array, which means we periodically have to copy all the data.
See, I knew it. Arrays are awesome. Amortized constant time.
And I think if we knew \( n \) ahead of time, we could avoid resizing and it'd just be constant time.
But what about LOOKUP…?
Needing to go all the way through is the problem, giving us \( \mathrm{O}(n) \) performance, way worse than red–black trees.
It'd be cool if it just knew the right place to look. Like I know where I buried a bone. Then it could just find it in constant time.
Yeah, but how could that possibly work? How could it know where to find it?
Well, actually…
(When logged in, completion status appears here.)