Θ Abstracts Away the Choice of Operation
Remember that one of the things we didn't like about counting operations was deciding what operation to count.
It turns out, we don't really have to make that decision when we do an asymptotic analysis.
So we have
- \( \mathrm{C}_{\mathrm{lessthan}}(n) = 2n + 2 \in \Theta(n) \)
- \( \mathrm{C}_{\mathrm{index}}(n) = n \in \Theta(n) \)
- \( \mathrm{C}_{\mathrm{inc}}(n) \in \Theta(n) \)
They all gave us different operation counts, but the same order of complexity.
What if we add them together?
Let \( \mathrm{C}_{\mathrm{lt\_idx\_inc}}(n) = \mathrm{C}_{\mathrm{lessthan}}(n) + \mathrm{C}_{\mathrm{index}}(n) + \mathrm{C}_{\mathrm{inc}}(n) \)
The best case number of operations is \[ \mathrm{C}_{\mathrm{best\_lt\_idx\_inc}}(n) = 4n + 2. \]
The worst case number of operations is \[ \mathrm{C}_{\mathrm{worst\_lt\_idx\_inc}}(n) = 5n + 2. \]
Either way, we can say that \( \mathrm{C}_{\mathrm{lt\_idx\_inc}}(n) \in \Theta(n) \).
It almost seems like it doesn't really matter what we choose...
Yes! That's exactly the situation! Asymptotically, it doesn't matter.
Take another look at the code.
bool lowerHalf(float x, const float* arr, size_t arrSize) { // 1
size_t numLessThan = 0; // 2
for (size_t i = 0; i < arrSize; ++i) { // 3
if (arr[i] < x) { // 4
++numLessThan; // 5
} // 6
} // 7
return numLessThan < arrSize/2.0; // 8
} // 9
Instead of picking a particular operation to count, we just have to focus on how the number of operations depends on \( n \).
Let's let \( \mathrm{C}(n) \) be the number of operations—or even CPU cycles, if you want—for an array of size \( n \).
Again, we can't write \( \mathrm{C}(n) \) down as an explicit expression! But we can still draw conclusions about its asymptotic behavior.
Lines 1-2: The number of operations on these lines doesn't depend on \( n \) at all. So this line takes \( \Theta(1) \) time.
Lines 3-7: Here we have a loop that iterates \( n \) times.
In each iteration of the loop, we do a bunch of operations:
- Initialize a variable,
- Two less than comparisons,
- One index,
- One or two increments,
- And probably some other implicit stuff that goes on behind the scenes, such as moving the instruction pointer and so on.
The point is that the number of operations it takes to do all of that work we do in one loop iteration doesn't depend on \( n \)!
So each iteration of the loop takes \( \Theta(1) \) time.
All together, we do \( n \cdot \Theta(1) \) operations, so the loop takes \( \Theta(n) \) time.
Line 8: The number of operations on this line doesn't depend on \( n \) at all. So this line takes \( \Theta(1) \) time.
What happens when you add \( \Theta(1) + \Theta(n) + \Theta(1) \)? You get \( \Theta(𝑛) \)! Linear plus a constant is still linear.
So we can conclude that \( \mathrm{C}_{n} \in \Theta(n) \), where \( n \) is the size of the array.
Wow! That's so cool! With \( \Theta \) we can take all operations into account in our analysis.
That's right! It is cool! Not having to be specific about what we are counting makes the analysis easier to perform and to interpret.
Also, I think that we lost something, too. We can't use this approach to reason about constant factors and lower-order terms.
That's right. In practice, if I have to choose between a function that does \( 2n^2 \) multiplications and one that does \( 100\,000\,000 \cdot n \) multiplications, I might prefer the first function, even though it takes \( \Theta(n^2) \) time.
Hay, wait! You said \( \Theta(1) + \Theta(n) + \Theta(1) \). Is that really something I can say? I thought that \( \Theta( \ldots ) \) was supposed to be a set? I'm confused.
You are so right. As a Mathematician, that I find this abuse of notation to be quite distasteful. Mathematically, you just can't write \( \Theta(n) + \Theta(1) \) because \( \Theta(n) \) is a set, not a number! And you can't say \( n \cdot \Theta(1) \) you can't multiply a set by \( n \).
You're right, of course. We're being rather casual with our notation here. Proper mathematicians like Big Owl may wag their fingers at us! Sorry Big Owl! But it's pretty common to see this kind of notation in computer science, even though technically it's wrong. We sort-of have this attitude that “we know what we mean”, and sometimes we're even right.
Well, I don't give a hoot what's okay for computer scientists, it's not proper mathematics!
We took a short-cut to avoid looking at counts. It worked here, but quick ways of figuring things out without working out all the details can be prone to error.
For example, doing an operation that takes \( \Theta(n) \) time in the worst case \( \Theta(n) \) times may always not result in a \( \Theta(n^2) \) algorithm. We will see an important example later on!
When you're unsure, working out counts (even simplified ones) can help.
On the other hand, doing a \( \Theta(1) \) operation \( \Theta(n) \) times is always going to take \( \Theta(n) \) time, so there are plenty of times when we don’t need to dive any deeper.
In this class, we’ll let you know if/when we need work out counts, and stick with things at the asymptotic level when we can.
(When logged in, completion status appears here.)