CS 70

Phase 6: Randomized Insert

In this step, you will extend your insert functionality to allow for inserting randomly, simulating the behavior of a random tree.

You'll follow the usual development stages (planning, writing tests, implementing, verifying).

You will need to submit a planning image for this phase, Planning/Phase_6.jpg.

Steps

Add a New Data Member

Recall from the specifications that your tree should have a member variable of type RandUInt32, which it will use to generate random numbers.

Add One More Constructor

Add one more constructor to the TreeSetclass:

  • The TreeSet parameterized constructor that takes two parameters: a bst_style and a size_t. The second parameter will be the seed for the random number generator.

See the Generate Random Numbers help page for information about how to initialize a RandUInt32 object with a seed (and/or the Week 10, Lesson 2 page on random-number generation).

Update insert

Now add a third conditional to your insert function to call a different private helper member function if the tree’s type is RANDOMIZED.

In that private helper function, you will need to randomly decide whether to insert at root for the current subtree or recursively call the randomized insert function on a lower subtree. You must use the random-number generator (RNG) in a specific way so that we can set the seed for your tree and reproducibly check the outcome. Do not use the tree's random-number generator at any point other than in these lines:

int randomInt = rand_.get(nodeSize(node) + 1);
bool doRootInsert = (randomInt == 0);   // Root insert with(1/new_subtree_size probability)

These lines first use the RNG to uniformly randomly pick an integer from 0 to nodeSize(node), which is equivalent to rolling a nodeSize(node) + 1-sided die! We arbitrarily pick one possible “side of the die”—0—to mean that we should insert and move to the root of this subtree; any other value means we should keep moving down the tree.

Your naming might differ, but please stick to this method for making the random decision of whether to insert or recurse. If you don't, your code will fail tests in the autograder!

(See the hints below for some tips on testing!)

Helpful Hints

Testing Randomized Insert

Testing randomized algorithms can be tricky! In particular, you might have been writing tests that do some operations and then compare the string representation of the tree to what you expected the tree structure to be. In most cases you can't do that here, because the outcome will be random!

Here are some ideas:

  • Insertion of items in sorted order builds a stick with both the LEAF and ROOT insertion styles, but RANDOMIZED trees will roughly balance the tree. Although you can't say for sure what the averageDepth or height functions will return, we know some options are incredibly unlikely. See the probabilities hint below.
  • Two randomly generated trees with the same elements probably won't print the same because the shape of the tree will be different (in fact, for a large tree, the probability of them being identical is vanishingly small), but they should compare as equal.
  • Two randomly generated trees with the same seed and the same elements but inserted in a different order can be guaranteed to not print the same, but should compare as equal.
  • If you know the seed, you can actually guarantee that every CS 70 randomized-BST implementation will print the exact same tree, because they are required to follow the exact same insertion algorithm, using the same random-number generator in the same way. We actually test your trees this way.

There are also some things that must be true, no matter how the randomness works out. For example, after inserting a key, the key must be in the set!

Expected Average Depth

Week 10, Lesson 2 gives some probabilities for the average node depth in a random tree. We'll provide the same kind of information here, but turned into easily computable approximations for the expected average depth of a random BST:

\[ \begin{align} \text{expected value} &= 2 \left(1+\frac{1}{n}\right) H_n-4 \\ & \approx 2 \left(\frac{1}{n}+1\right) (\log n+\gamma+ \frac{1}{2n} )-4 \\ & \in \Theta(\log n) \end{align} \]


\[ \begin{align} \text{variance} &= 7 - 4 \mathrm{H}_n^{(2)} - \frac{2 \left( \mathrm{H}_n+2 \mathrm{H}_n^{(2)} \right)}{n^2} + \frac{-2 \mathrm{H}_n - 8 \mathrm{H}_n^{(2)} + 13}{n} \\ &\approx 7 - \frac{2 \pi^2}{3} + \frac{17 - 2 \gamma - \frac{4}{3} \pi^2 - 2 \log n}{n} + \frac{5 - 2 \gamma - \frac{2}{3} \pi^2 - 2 \log n}{n^2} \\ & \approx_{n \to \infty} 7 - \frac{2 \pi^2}{3} \approx 0.420264 \\ & \in \Theta(1), \end{align} \]

where \( \gamma \approx 0.577216 \) is Euler's constant and \( \mathrm{H}_n \) are the harmonic numbers.

These approximations are accurate even for quite small \( n \), being within 1% of the true value when \( N \geq 5 \) and becoming more accurate as \( n \) increases.

We can turn the above into C++ code (which requires #include <cmath>):

    // Tree formulas provided by Prof. Melissa, based on
    // https://sites.math.rutgers.edu/~zeilberg/mamarim/mamarimPDF/qsort.pdf
    constexpr double EGAMMA = 0.57721566490153286;  // Euler's constant
    constexpr double PI_SD3 = 3.2898681336964529;   // (Pi^2)/3
    double avgDepthExpected =
        2. * (1. + 1. / N) * (log(N) + EGAMMA + 1/(2. * N)) - 4.;
    double avgDepthStdDev =
        sqrt(7. - 2.*PI_SD3 + (17. - 2.*EGAMMA - 4.*PI_SD3 - 2.*log(N)) / N
             + (5 - 2*EGAMMA - 2*PI_SD3 - 2. * log(N)) / (N * N));

For example, when n = 100, these formulas calculate avgDepthExpected as 6.47852 and avgDepthStdDev as 0.59483. The true values to the same precision are 6.47850 and 0.59483, respectively.

Statistically, it is unlikely that we'll get a result four standard deviations away from the expected value, or we'd repeatedly get results two standard deviations away from the expected value. You can use these kinds of statistical properties in your tests.

To Complete This Part of the Assignment…

You'll know you're done with this part of the assignment when you've done all of the following:

(When logged in, completion status appears here.)