CS 70

Randomized Trees

An Idea: Shuffling Keys

  • Horse speaking

    Hay, I have an idea!

  • LHS Cow speaking

    Go for it!

  • Horse speaking

    What if we randomly shuffle the order of all the keys before we insert them?

  • LHS Cow speaking

    Oh, interesting!

One of the limitations of our expected-case analysis is that we relied on an assumption that all insertion orders are equally likely. That might not be true in every problem!

One way to remove that limitation is to randomly shuffle the keys before we insert them.

In that case we know that all orderings are equally likely! We essentially force our probabilistic assumption to be true and guarantee that the expected complexity is logarithmic.

  • Goat speaking

    Meh. That seems pretty impractical.

  • LHS Cow speaking

    Oh?

  • Goat speaking

    Yeah! In order to shuffle the keys, you need to know all of the keys beforehand.

  • LHS Cow speaking

    Huh. That's true.

  • Goat speaking

    What if the keys come in one at a time? Then we can't shuffle them before we insert them!

  • LHS Cow speaking

    This is a good point. While sometimes we might be able to know the keys ahead of time and shuffle them before we insert them, that's not always the case. If only we had some other way… If only we could have a tree that “shuffled itself” as we inserted the keys…

  • Hedgehog speaking

    I'm not sure I like where this is going… Is it going to be complicated?

  • LHS Cow speaking

    Actually, it's not too bad. It's a pretty cool idea, and it's not too hard to implement! Let's dive in!

Randomized BSTs

When you have a new key, it follows a single path from the root to a leaf in the tree (greater than this to go right, less than this so go left, and so on…).

Randomized trees keep themselves balanced by randomly selecting where along that path the new node should end up.

The basic idea is that at each node along the path, we randomly choose whether to insert the node deeper in the tree or to insert the node such that it becomes the root of this subtree (i.e., right in this spot). For now, don't worry about how we would cause the new node to become root of a subtree; we'll get to that later.

Randomized trees simulate the process of shuffling the keys by moving the randomness into insert, instead of having all of the randomness happen by shuffling ahead of time! Rather than shuffling the data before inserting, we shuffle the arrangement of the nodes within the tree as we insert them. (Without breaking the BST property, of course!)

If we remember that we want our trees to have the same characteristics as trees generated from shuffled keys, we can use what we know about random shuffling to help us figure out the details of how we want randomized insert to work.

Randomized Insertion Example

Suppose we have this tree, where the "20" node is the one we want to insert:

If we were shuffling the keys beforehand, then we would have had all of the values—including the 20—before we started building the tree!

Assuming that we used regular BST insertion for all of our values in the randomly generated tree, what would have had to have been true after we shuffled the values up for "20" to have ended up being the root of the random tree?

Since BST insertion always adds values to the bottom of the tree, we know that the root will be whatever value was added first. That means that for "20" to have been the root, it would need to have been first in the list of values after they were shuffled.

Enter the fraction that represents the probability that "20" would have been the first value in the list of shuffled values (as a / b). If you think we don't have enough information to say, enter N/A.

There are already 11 values in the tree, and one value we're trying to add.

If we had all of the values ahead of time, we would have had 12 values to shuffle.

Every value has an equal likelihood of being first when we shuffle the list, so each value (including the 20) has a \( \frac{1}{12} \) probability of being the first one in the list.

In general, if we have \( N \) values already in our tree, then we should have a \( \frac{1}{ N + 1 } \) probability that the new value becomes the root!

Inserting in the 11-Node Subtree

  • RHS Cow speaking

    Let's roll a 12-sided die to represent our \( \frac{1}{12} \) probability that we insert at the root.

  • LHS Cow speaking

    If we roll a 1, we'll insert at the root of this tree!

  • LHS Cow speaking

    We didn't get a 1.

  • Dog speaking

    That means we don't insert at the root. So we insert at the leaf, right?

  • LHS Cow speaking

    No, it just means we don't insert at the root. Like other things in trees, our randomized insert is going to be recursive.

  • Hedgehog speaking

    Recursive randomness?!! My tummy hurts.

If we had to do all the insertions all at once, it could be pretty complicated. But remember the beauty of recursion!

We just have to handle one case at a time, and we push all of the complicated stuff into the recursive call.

Our die roll told us that we didn't want to insert at the root of our tree. That means we're going to insert it somewhere in the left subtree, since 20 < 37.

Where in the left subtree will it end up? That's up to recursion!

What is the probability the new node (20) should be the root of the subtree whose current root is 19?

There are \( N = 3 \) values currently in the subtree whose root is 19 and we have one more to add.

If we were shuffling the keys beforehand, whichever one of those values came first in the shuffled order would get to be the root of the tree.

Each key has an equal chance of being first, so 20 would be the root of the subtree with a probability of \( \frac{1}{ N + 1 } = \frac{1}{4} \).

Inserting in the 3-Node Subtree

We can use a four-sided die to decide whether we should insert at the root of this subtree. Again, we'll say that a 1 means "insert at root", and anything else means "recurse down into a subtree."

  • Alien speaking

    The simulated tetrahedron has generated the numeral 3.

  • LHS Cow speaking

    What does that mean?

  • Cat speaking

    Recurse into a subtree!

  • LHS Cow speaking

    You got it! Since 20 > 19, we'll go into the right subtree this time:

What is the probability that we should insert at the root of the subtree whose root is 29?

Remember that we're trying to simulate what would happen if we were able to shuffle all of our values up and generate random trees.

Of all the trees that had 20 and 29 in their own subtree, we would expect 20 to be the root half the time and 29 to be the root the other half of the time.

Inserting in the 1-Node Subtree

  • Dog speaking

    Oh boy! It's time for a coin flip!

  • LHS Cow speaking

    Heads means we insert at root, tails means we recurse into the subtree.

  • Dog speaking

    Heads! That means insert at root!

  • LHS Cow speaking

    Yes, and remember that the "root" is the root of this subtree. so we'll insert our new value above the 29.

  • RHS Cow speaking

    And now our tree is

(When logged in, completion status appears here.)