Expected Case Analysis
In our attempt to count increment operators, we found that different input cases would lead to different counts.
So we tried to find the range of possibilities by performing best and worst case analyses.
Meh. That doesn't seem like a great idea.
Oh? Why's that?
Well, it's hardly ever going to be the absolute worst or the absolute best, right? Why only focus on the extremes?
That's a good point. Maybe the worst case is really bad, but also very unlikely.
Let's see if we can get some insight into what the "typical" number of operations would be.
In expected case analysis we are trying to get a sense of the typical number of operations.
Of course, in order to talk about what's typical, we have to know how likely different scenarios are, which is not something we've been considering so far!
Expected case analysis has three main steps:
- Break all possible inputs into cases.
- Assign a probability to each case.
- Calculate the expected value of the number of operations over that probability distribution.
Let's see it in action for increment operations in our example!
bool lowerHalf(float x, const float* arr, size_t arrSize) { // 1
size_t numLessThan = 0; // 2
for (size_t i = 0; i < arrSize; ++i) { // 3
if (arr[i] < x) { // 4
++numLessThan; // 5
} // 6
} // 7
return numLessThan < arrSize/2.0; // 8
} // 9
Break Into Cases
There are infinitely many possible inputs to our algorithm!
x
can be any number, we can have any number of elements in the array, and each element can be any number!
It will be easier to do our analysis if we can group the inputs by the number of operations and their likelihood.
There are often lots of ways to do this!
With a bit of knowledge about the problem the algorithm is solving, usually a few reasonable ones pop out.
Let's try grouping the inputs by where x
ranks in the values of the array.
If we put the values of the array in sorted order, as \( ( a_1, a_2, a_3, \dots{} , a_n ) \), x
could belong in any of the positions marked in orange below:
Case 1 | \( a_1 \) | Case 2 | \(a_2\) | Case 3 | \(a_3\) | … | Case \(n\) | \(a_n\) | Case \(n+1\) |
So in case 1, x
is less than all the numbers in the array. In case 2, x
is greater than one number, but less than all the others. In case \( n + 1 \), x
is greater than all the numbers in the array.
No matter what numeric values are involved, knowing what case we're in is enough to figure out how many increment operations will happen!
Assign Probabilities
Next we need to determine the probability of each case.
How are we supposed to know how likely each case is?
Well, usually we, uh…, make it up?
Most commonly, we will choose a probabilistic model for our analysis.
In other words, we'll make assumptions about how likely each case is. Then we'll hope that this model is not too far off from how things will be in practice.
Choosing a reasonable probability model is part of an expected case analysis. That said, in CS 70, if we ask you to perform an analysis, we'll tell you what probability model to use.
Uniform Probability Assumption
A very common probability model is the uniform-probability assumption.
When we assume a uniform-probability distribution, we assume that all cases are equally likely.
Compute The Expected Value
The expected value of the number of increment operations is one way to characterize the "typical" number of operations.
If you've taken a probability and/or statistics course before, you might have learned about expected values.
Sure! I remember this!
Nope! I've never taken a prob/stats course and I've never heard of this.
I did take a probability course but I don't really remember…
All of that is okay! We'll explain everything you need to know for CS 70 right here!
The expected number of operations is a weighted average of the number of operations for each case, where the weights are the probabilities of each case.
So an expensive and common case would add a lot to the expected operation count, but a cheap or rare case would only add a little bit.
Specifically, we calculate it as
$$ \mathrm{Pr}(\mathrm{Case}\ 1) \cdot \mathrm{Count}(\mathrm{Case}\ 1) + \mathrm{Pr}(\mathrm{Case}\ 2) \cdot \mathrm{Count}(\mathrm{Case}\ 2) + \cdots{} + \mathrm{Pr}(\mathrm{Case}\ K) \cdot \mathrm{Count}(\mathrm{Case}\ K). $$
Or, in summation form, $$ \sum^{K}_{k=1} \mathrm{Pr}( \mathrm{Case}\ k ) \cdot \mathrm{Count}( \mathrm{Case}\ k ) $$
What does the expected value actually mean?
Ah! That's a great question.
Imagine that you can draw a case out of a hat according to the probabilities, write down the number of operations, and put it back in the hat.
Draw another. And another. And a whole bunch more…
Now you can take the average of the operation counts for all of the cases you pulled out of the hat. As the number of cases you draw goes to infinity, that average will converge to the expected value.
So the expected value is like the average number of operations over infinitely many randomly selected inputs.
In this example, we get
$$ \frac{1}{n + 1} \cdot{} n + \frac{1}{n + 1} \cdot{} ( n + 1 ) + \frac{1}{n + 1} \cdot{} ( n + 2 ) + \frac{1}{n + 1} \cdot{} ( n + 3 ) + \cdots\ + \frac{1}{n + 1} \cdot{} 2n $$
Or, in summation form,
$$ \sum^{n+1}_{k=1} \frac{ k + n -1 }{ n + 1 } $$
Okay! So we have
\[ \begin{align} \sum^{n+1}_{k=1} \left( \frac{ k + n - 1 }{ n + 1 } \right) & = \frac{1}{ n + 1 } \cdot{} \sum^{n+1}_{k=1} ( k + n - 1 ) \\ & = \left( \frac{1}{ n + 1 } \cdot{} \sum^{n+1}_{k=1} k \right) + \left( \frac{1}{ n + 1 } \cdot{} \sum^{n+1}_{k=1} ( n - 1 ) \right). \end{align} \]
Now let's deal with each of those summations one at a time (we'll go from right to left).
The second term doesn't have anything that depends on \( k \), so we can pull it all out of the sum:
\[ \begin{align} \left( \frac{1}{ n + 1 } \right) \cdot{} \sum^{n+1}_{k=1} n - 1 & = \left( \frac{ n - 1 }{ n + 1 } \right) \cdot{} \sum^{n+1}_{k=1} 1 \\ & = \left( \frac{ n - 1 }{ n + 1 } \right) \cdot{} ( n + 1 ) = n - 1 \end{align} \]
The first term looks familiar! We should apply Gauss's formula, \( \sum^{N}_{i=1} i = \frac{N( N + 1 )}{2} \)!
In our example, \( N = n + 1 \). Thus
\[ \begin{align} \left( \frac{1}{ n + 1 } \right) \cdot{} \sum^{n+1}_{k=1} k & = \left( \frac{1}{ n + 1 } \right) \cdot{} \left( \frac{ ( n + 1 )( n + 2 ) }{2} \right) \\ & = \frac{ n + 2 }{2}. \end{align} \]
So, all together, we have
\[ \begin{align} \left(\frac{ n + 2 }{2}\right) + \left(n - 1\right) & = \frac{ (n + 2) + 2(n - 1)}{2} \\ & = \frac{ 3n }{2} \\ & = 1.5n \end{align} \]
All that math all at once! My eyes! My eyes!! I can't see!!!!
Okay, try to remain calm. Reading math is a skill, but it's a skill you can practice.
The key is not to try to read it like a paragraph, where you read very quickly, skim over words, and so on.
Reading technical content like this requires a lot of active attention.
Go step by step. Maybe write it out as you go. Make sure you understand each step.
If a step doesn't make sense at first, go back to it and really spend time working it out.
You can do this!!
Oh! That makes sense! If all positions for
x
are equally likely, then we sort of expect it to be in the middle on average.That's right, and when
x
ends up in the middle, we do \( n \) increments for the loop and \( n / 2 \) increments fornumLessThan
.It's so exciting when the math works out!!
Meh, I'd have just guessed that it's 1.5 times \( n \).
In practice, sometimes we can skip the math entirely, or apply short-cuts, but it's important to know how to do it properly so that we have a general method for the harder cases.
Key Takeaways
Expected case analysis is useful when we want to go beyond the best or worst case.
If we have some reasonable assumptions (or actual knowledge) about how likely different cases are, we can get the expected number of operations.
If we call the function many times, then this gives us an idea of how many operations it will take on average.
(When logged in, completion status appears here.)