CS 70

Before You Start

You've implemented a set interface with a binary-search tree. The main operations for a set are INSERT and LOOKUP; ideally we want both of these to be efficient.

In a red–black tree, what is the worst-case complexity of INSERT and LOOKUP?

For a red–black tree, the performance for INSERT and LOOKUP is

  • \( \Theta( \log{n} ) \) in the worst case
  • \( \Theta(1) \) in the best case
  • \( \mathrm{O}(\log{n}) \) overall
  • Goat speaking

    Meh, \( \mathrm{O}(\log{n}) \) seems kinda clunky.

  • Duck speaking

    Yeah, when I ran minispell from the homework, even with a perfectly balanced tree, with 34,831 words the height was 15, so there would be 16 string comparisons made in the worst case. That's not a huge number, but it's not nothing.

  • Pig speaking

    I want MORE speed!

  • Duck speaking

    Arrays are constant time to access an element, right? Couldn't we just use an array to store the words?

Using an Array-Backed List to Store a Set

Suppose that we instead implemented our set with an array-backed list (e.g., std::vector). We could do the following:

  • INSERT: Add the new item to the end of the list (i.e., push_back()).
  • LOOKUP: Search for the item in the list.
  • Hedgehog speaking

    Can you remind me what an array-backed list is?

  • LHS Cow speaking

    Fundamentally, we're storing the data in an array, but the array may have some empty space at the end. We keep track of how much of the array is actually used, and we can add new items to the end of the array. If we run out of space, we can allocate a new array that's bigger (usually twice as big) and copy everything over. The array is stored on the heap, which allows us to choose how big it is at runtime and allocate a new one to replace it when we need to.

When using an array-backed list to store a set, what would be the best characterization of the complexity of INSERT?

Adding to an array-backed list with push_back() takes amortized \( \Theta(1) \) time because we might have to increase the capacity of the underlying array, which means we periodically have to copy all the data.

  • Duck speaking

    See, I knew it. Arrays are awesome. Amortized constant time.

  • Cat speaking

    And I think if we knew \( n \) ahead of time, we could avoid resizing and it'd just be constant time.

But what about LOOKUP…?

When using an array-backed list to store a set, what would be the best characterization of the complexity of LOOKUP?

  • Duck speaking

    Needing to go all the way through is the problem, giving us \( \mathrm{O}(n) \) performance, way worse than red–black trees.

  • Dog speaking

    It'd be cool if it just knew the right place to look. Like I know where I buried a bone. Then it could just find it in constant time.

  • Duck speaking

    Yeah, but how could that possibly work? How could it know where to find it?

  • LHS Cow speaking

    Well, actually…

(When logged in, completion status appears here.)