CS 70

Search

  • Watson speaking

    For my detective work, I need to keep track of students, and I want to be able to say who is on campus and who isn't.

  • LHS Cow speaking

    That kind of problem is pretty common—let's make it an example!

We often want to store a bunch of data in a way that makes it easy to search through.

Example: Suppose that Watson wants to store the student IDs of all the students who are on campus. This is a large set of student IDs. We want to be able to quickly answer the question, “Is this student on campus?”, given their ID number.

To solve this problem, there are two fundamental operations that we'll need:

INSERT: Add a new item to the collection.

LOOKUP: Find an item in the collection (or determine that it is not there).

  • Duck speaking

    This seems a lot like dictionaries (and maybe sets?) in Python.

  • LHS Cow speaking

    Yes, it's the same idea.

  • A map maps keys to values. In the container, we store each key together with its associated value. LOOKUP takes a key and gives access to the stored value for that key. Python dictionaries are an example of a map.
  • A set is just a set of keys. We only store keys in the container, and LOOKUP takes a key and determines whether it is there. Python sets are an example of a set.
  • Pig speaking

    So in the dining hall, they might use a map from student ID to how many MORE meals that student has left to eat this semester?

  • LHS Cow speaking

    Sure. And a set can solve Watson's problem from earlier.

We'll focus on sets, but in fact everything we do for sets can be easily modified to solve the map problem instead. (In many ways, a map is just like a set where we're storing key/value pairs in the set.)

  • RHS Cow speaking

    So what ideas do we have for Watson's problem?

A Bad Idea

  • Dog speaking

    Let's just make a big array of all the ID numbers!

  • LHS Cow speaking

    Okay, let's consider something like std::vector, which is an array-backed list.

Performance of an Array-Backed List

INSERT: The good news is that adding new numbers to an array-backed list is efficient. We just need to push them onto the back of the array, which takes \( \Theta(1) \) time.

(For now we'll ignore the cost of “resizing” the array when necessary. We'll get there eventually!)

LOOKUP: The bad news comes when we try to find a particular item in the array. Because the numbers are not arranged in any particular way, we have no choice but to iterate through the array until we find the item (or don't).

What is the most accurate and precise way to characterize the complexity of LOOKUP in this structure?

First, let's think about best and worst cases:

  • In the worst case (the item is last or not present) it takes \( n \) operations, \( \Theta(n) \) time.
  • In the best case it only takes one operation, \( \Theta(1) \) time.

So the overal complexity is no worse than linear (but it could be better). Hence \( \mathrm{O}(n) \) is the most precise thing we can safely say in general.

Thus, looking up an item in an arbitrary array takes \( \mathrm{O}(n) \) time.

  • Pig speaking

    I want MORE speed than that!

A More Clever Bad Idea

  • Dog speaking

    Okay, if we want to make lookup easier, let's keep the array sorted!

Performance of a Sorted Array-Backed List

LOOKUP: It is true that lookup is easier in a sorted array! It's easier for the same reason that it's easier to find a word in a dictionary than it is to find that word in a novel.

Just like looking something up in a dictionary, we start in the middle of the list and then, depending on whether our item should come before or after the item at the midpoint, search the half that contains the item we are looking for. You may have seen this algorithm before; it's called binary search. (If you haven't, that's okay!).

Since binary search rules out half of the list at every iteration, LOOKUP will never take more than \( \log{n} \) steps to find an item (or determine that it is not present). It could, of course, find the item right away. So the complexity is no worse than logarithmic, but it could be better. So the lookup complexity would be \( \mathrm{O}( \log{n} ) \). Pretty nice!

INSERT: This time it's insertion that's the bad news. When we have a new item to insert, we have to put it in the right place to keep the list sorted. We can use binary search to find the place, so that's easy, but once we find it, we have to shift items over to make room.

What is the most accurate and precise characterization of the complexity of INSERT for this structure?

In the worst case we would have to shift all \( n \) items over. But in the best case, we might not have to shift any items over. So the complexity is no worse than linear, but it could be better. So \( \mathrm{O}(n) \) is the most accurate and precise characterization.

A sorted array makes lookup faster. We can use binary search, which takes \( \mathrm{O}( \log{n} ) \) time!

But now we have to keep the array sorted, which means that insertion takes \( \mathrm{O}(n) \) time.

An Even More Clever Bad Idea

  • Dog speaking

    Okay, okay. So it's hard to insert things into an array-backed list. But it's easy to insert things into a linked list! So let's use a linked list!

Peformance of a Linked List

INSERT: It's true that we can insert things into a linked list very easily! If we know where to perform the insert, we can do it in \( \Theta(1) \) time! So the complexity of insert here depends entirely on the complexity of lookup.

LOOKUP: Now the problem is lookup again! Even if we have a sorted linked list, we can't perform binary search!

  • Horse speaking

    Why not?

  • LHS Cow speaking

    Looking up a word in an actual paper dictionary is easy because you can flip directly to pages partway through the book. A linked list does not allow that—linked lists are terrible at random access!

So we'd have to go back to linear search (just going through the list one item at a time), which takes \( \mathrm{O}(n) \) time.

  • Goat speaking

    BTW, I don't think my generation uses paper dictionaries, or telephone directories, or the Yellow Pages. We just use a search engine on the internet.

  • Jo speaking

    Or we text them!!

  • Hedgehog speaking

    Yeah. Talking is hard.

  • Goat speaking

    Meh. No website, no business.

  • LHS Cow speaking

    Sorry, those things were good examples of how alphabetization (sorting) helped, but they've kinda fallen out of use. But I'm sure you can think of some other physical-world examples where alphabetization is still useful.

  • Pig speaking

    Sure, I keep my cans of food alphabetized so I can get MORE things in the cupboard. So anyway, am I out of luck in my quest for MORE speed…?

A Better Idea

  • LHS Cow speaking

    There is a data structure that combines the strengths of both a sorted array and a linked list.

  • Dog speaking

    Yeah! What is it?!

  • LHS Cow speaking

    It needs to organize the data according to an order, like a sorted list. But it needs to be made of nodes, like a linked list.

  • RHS Cow speaking

    It's called a binary search tree!

  • Hedgehog speaking

    Oh! I think I saw that in a previous class, but I've forgotten all the details now!

(When logged in, completion status appears here.)