CS 70

Expected Case Analysis

  • LHS Cow speaking

    In our attempt to count increment operators, we found that different input cases would lead to different counts.

  • RHS Cow speaking

    So we tried to find the range of possibilities by performing best and worst case analyses.

  • Goat speaking

    Meh. That doesn't seem like a great idea.

  • LHS Cow speaking

    Oh? Why's that?

  • Goat speaking

    Well, it's hardly ever going to be the absolute worst or the absolute best, right? Why only focus on the extremes?

  • LHS Cow speaking

    That's a good point. Maybe the worst case is really bad, but also very unlikely.

  • RHS Cow speaking

    Let's see if we can get some insight into what the "typical" number of operations would be.

In expected case analysis we are trying to get a sense of the typical number of operations.

Of course, in order to talk about what's typical, we have to know how likely different scenarios are, which is not something we've been considering so far!

Expected case analysis has three main steps:

  1. Break all possible inputs into cases.
  2. Assign a probability to each case.
  3. Calculate the expected value of the number of operations over that probability distribution.
  • LHS Cow speaking

    Let's see it in action for increment operations in our example!

bool lowerHalf(float x, const float* arr, size_t arrSize) {  // 1
    size_t numLessThan = 0;                                  // 2
    for (size_t i = 0; i < arrSize; ++i) {                   // 3
        if (arr[i] < x) {                                    // 4
            ++numLessThan;                                   // 5
        }                                                    // 6
    }                                                        // 7
    return numLessThan < arrSize/2.0;                        // 8
}                                                            // 9

Break Into Cases

There are infinitely many possible inputs to our algorithm!

x can be any number, we can have any number of elements in the array, and each element can be any number!

It will be easier to do our analysis if we can group the inputs by the number of operations and their likelihood.

  • RHS Cow speaking

    There are often lots of ways to do this!

  • LHS Cow speaking

    With a bit of knowledge about the problem the algorithm is solving, usually a few reasonable ones pop out.

Let's try grouping the inputs by where x ranks in the values of the array.

If we put the values of the array in sorted order, as \( ( a_1, a_2, a_3, \dots{} , a_n ) \), x could belong in any of the positions marked in orange below:

Case 1 \( a_1 \) Case 2 \(a_2\) Case 3 \(a_3\) Case \(n\) \(a_n\) Case \(n+1\)

So in case 1, x is less than all the numbers in the array. In case 2, x is greater than one number, but less than all the others. In case \( n + 1 \), x is greater than all the numbers in the array.

No matter what numeric values are involved, knowing what case we're in is enough to figure out how many increment operations will happen!

For Case 1, how many increment operations will there be?

In Case 2, how many increment operations will there be?

In Case \( n + 1 \), how many increment operations will there be?

For a general Case \( k \), give an expression in terms of \( n \) and \( k \) for the number of increment operations.

Assign Probabilities

  • RHS Cow speaking

    Next we need to determine the probability of each case.

  • Duck speaking

    How are we supposed to know how likely each case is?

  • LHS Cow speaking

    Well, usually we, uh…, make it up?

Most commonly, we will choose a probabilistic model for our analysis.

In other words, we'll make assumptions about how likely each case is. Then we'll hope that this model is not too far off from how things will be in practice.

Choosing a reasonable probability model is part of an expected case analysis. That said, in CS 70, if we ask you to perform an analysis, we'll tell you what probability model to use.

Uniform Probability Assumption

A very common probability model is the uniform-probability assumption.

When we assume a uniform-probability distribution, we assume that all cases are equally likely.

Assuming all cases are equally likely, what is the probability of any single case?

Hint: Remember that there are \( n + 1 \) distinct cases.

Compute The Expected Value

The expected value of the number of increment operations is one way to characterize the "typical" number of operations.

  • RHS Cow speaking

    If you've taken a probability and/or statistics course before, you might have learned about expected values.

  • Cat speaking

    Sure! I remember this!

  • Sheep speaking

    Nope! I've never taken a prob/stats course and I've never heard of this.

  • Hedgehog speaking

    I did take a probability course but I don't really remember…

  • LHS Cow speaking

    All of that is okay! We'll explain everything you need to know for CS 70 right here!

The expected number of operations is a weighted average of the number of operations for each case, where the weights are the probabilities of each case.

So an expensive and common case would add a lot to the expected operation count, but a cheap or rare case would only add a little bit.

Specifically, we calculate it as

$$ \mathrm{Pr}(\mathrm{Case}\ 1) \cdot \mathrm{Count}(\mathrm{Case}\ 1) + \mathrm{Pr}(\mathrm{Case}\ 2) \cdot \mathrm{Count}(\mathrm{Case}\ 2) + \cdots{} + \mathrm{Pr}(\mathrm{Case}\ K) \cdot \mathrm{Count}(\mathrm{Case}\ K). $$

Or, in summation form, $$ \sum^{K}_{k=1} \mathrm{Pr}( \mathrm{Case}\ k ) \cdot \mathrm{Count}( \mathrm{Case}\ k ) $$

  • Duck speaking

    What does the expected value actually mean?

  • LHS Cow speaking

    Ah! That's a great question.

Imagine that you can draw a case out of a hat according to the probabilities, write down the number of operations, and put it back in the hat.

Draw another. And another. And a whole bunch more…

Now you can take the average of the operation counts for all of the cases you pulled out of the hat. As the number of cases you draw goes to infinity, that average will converge to the expected value.

  • RHS Cow speaking

    So the expected value is like the average number of operations over infinitely many randomly selected inputs.

In this example, we get

$$ \frac{1}{n + 1} \cdot{} n + \frac{1}{n + 1} \cdot{} ( n + 1 ) + \frac{1}{n + 1} \cdot{} ( n + 2 ) + \frac{1}{n + 1} \cdot{} ( n + 3 ) + \cdots\ + \frac{1}{n + 1} \cdot{} 2n $$

Or, in summation form,

$$ \sum^{n+1}_{k=1} \frac{ k + n -1 }{ n + 1 } $$

Can you simplify this summation? Give it a try! (Don't forget Gauss's formula!)

Okay! So we have

\[ \begin{align} \sum^{n+1}_{k=1} \left( \frac{ k + n - 1 }{ n + 1 } \right) & = \frac{1}{ n + 1 } \cdot{} \sum^{n+1}_{k=1} ( k + n - 1 ) \\ & = \left( \frac{1}{ n + 1 } \cdot{} \sum^{n+1}_{k=1} k \right) + \left( \frac{1}{ n + 1 } \cdot{} \sum^{n+1}_{k=1} ( n - 1 ) \right). \end{align} \]

Now let's deal with each of those summations one at a time (we'll go from right to left).

The second term doesn't have anything that depends on \( k \), so we can pull it all out of the sum:

\[ \begin{align} \left( \frac{1}{ n + 1 } \right) \cdot{} \sum^{n+1}_{k=1} n - 1 & = \left( \frac{ n - 1 }{ n + 1 } \right) \cdot{} \sum^{n+1}_{k=1} 1 \\ & = \left( \frac{ n - 1 }{ n + 1 } \right) \cdot{} ( n + 1 ) = n - 1 \end{align} \]

The first term looks familiar! We should apply Gauss's formula, \( \sum^{N}_{i=1} i = \frac{N( N + 1 )}{2} \)!

In our example, \( N = n + 1 \). Thus

\[ \begin{align} \left( \frac{1}{ n + 1 } \right) \cdot{} \sum^{n+1}_{k=1} k & = \left( \frac{1}{ n + 1 } \right) \cdot{} \left( \frac{ ( n + 1 )( n + 2 ) }{2} \right) \\ & = \frac{ n + 2 }{2}. \end{align} \]

So, all together, we have

\[ \begin{align} \left(\frac{ n + 2 }{2}\right) + \left(n - 1\right) & = \frac{ (n + 2) + 2(n - 1)}{2} \\ & = \frac{ 3n }{2} \\ & = 1.5n \end{align} \]

  • Hedgehog speaking

    All that math all at once! My eyes! My eyes!! I can't see!!!!

  • LHS Cow speaking

    Okay, try to remain calm. Reading math is a skill, but it's a skill you can practice.

  • RHS Cow speaking

    The key is not to try to read it like a paragraph, where you read very quickly, skim over words, and so on.

  • LHS Cow speaking

    Reading technical content like this requires a lot of active attention.

  • RHS Cow speaking

    Go step by step. Maybe write it out as you go. Make sure you understand each step.

  • LHS Cow speaking

    If a step doesn't make sense at first, go back to it and really spend time working it out.

  • RHS Cow speaking

    You can do this!!

  • Dog speaking

    Oh! That makes sense! If all positions for x are equally likely, then we sort of expect it to be in the middle on average.

  • LHS Cow speaking

    That's right, and when x ends up in the middle, we do \( n \) increments for the loop and \( n / 2 \) increments for numLessThan.

  • Dog speaking

    It's so exciting when the math works out!!

  • Goat speaking

    Meh, I'd have just guessed that it's 1.5 times \( n \).

  • LHS Cow speaking

    In practice, sometimes we can skip the math entirely, or apply short-cuts, but it's important to know how to do it properly so that we have a general method for the harder cases.

Key Takeaways

Expected case analysis is useful when we want to go beyond the best or worst case.

If we have some reasonable assumptions (or actual knowledge) about how likely different cases are, we can get the expected number of operations.

If we call the function many times, then this gives us an idea of how many operations it will take on average.

(When logged in, completion status appears here.)