Characterizing Operation Cost that Varies
It might be true that sometimes doing \(\mathrm{O}(n)\) work for \(n\) iterations can only cost \(\Theta(n)\), but it's certainly a bit counterintuitive. If someone said, “\(\mathrm{O}(n)\) work done \(n\) times”, my first guess would be \( \mathrm{O}(n^2) \).
Very true. Without some indication that \(\mathrm{O}(n)\) performance occurs very rarely, we'd all guess \(\mathrm{O}(n^2)\).
The problem is that \(\mathrm{O}(n)\) is very vague. It's saying that performance could be as bad as \(\Theta(n)\) but it might not be. In the case of our in-order BST iterator, \(\Theta(n)\) was the worst case, but that worst case could not occur often.
Worst Case Is Often Too Pessimistic
When the worst case doesn't happen often, it can be a misleading metric. Let's look at another example to explore this situation.
The Red Card–Blue Card Pairing Game
Suppose we have two decks of \(n\) playing cards (e.g., \(n = 52\)). One is red, the other is blue.
You keep red cards. Whenever you get a blue card, you must take a red card from your hand, and put the pair (red and blue) on the table in front of you.
At each turn of the game, the dealer gives you
- One new red card; and,
- Zero or more blue cards.
- You don't know how many blue cards you'll get on a given turn; but,
- The dealer can't give you more blue cards than the number of unpaired red cards (i.e., the number you hold in your hand).
The game stops when the dealer runs out of red cards to give you.
[Precisionist details: You need to pair each blue card with a unique red card when you receive it—you can't leave blue cards unpaired or reuse the same card for multiple pairings. Cards are never put back in the deck during the game. Other details of the game need not concern us; feel free to make up a pairing game with these rules as a foundation.]
Examples
The dealer could give you a red card and a blue card every turn until all the cards are gone, or they could give you no blue cards at all until the very last red card, at which point they give you the entire deck of blue cards. The animated GIFs below show these two scenarios.
Here are some other ways that cards might be dealt:
Wait, so I might not get ALL the blue cards by the end of the game?
And I don't know exactly how many blue cards to expect each turn, but I do know I'll never have more blue cards than red cards, right?
That's right.
Worst-Case Blue Cards
Based on the “all cards on the last turn” example, the worst-case number of blue cards you could be given on any turn is \( \Theta(n) \).
The “blue cards every 13 turns” example also seems like it might be \( \Theta(n) \) worst case, since \( 13 = 52/4 \), and \( \Theta(n/4) = \Theta(n) \).
Let's suppose we have a 52-card deck. Saying \(\Theta(n)\) as the worst case for each turn might give the impression that we could receive 2704 blue cards (\( 52^2 \)) total. It's true that we could be given all 52 in one turn, or half-deck sized chunks in two turns, but overall, across all turns, we'll still only be given at most 52 cards.
Okay, so can we say that on average we get one blue card per turn?
We could, but the phrase “on average” can mean a lot of things.
How about “in expectation” then?
Well, that implies some kind of randomness, and might imply that we could (in some very unlikely scenario) receive 1000 blue cards. But that can't happen.
The reality is that this game makes a much stronger guarantee than that.
And so we introduce a new concept: amortized time.
Amortized Time
Here's the definition of amortized time:
Definition: When operations have a specified amortized time, the total time of all the operations performed up until now is bounded by the sum of the amortized times of the individual operations.
This definition is pretty important, so read it over a couple of times!
Example: In our card game, we can say that we receive one amortized blue card per turn. At the end, after 52 turns, we will have at most 52 blue cards.
It's a lie! We can be given more than one blue card per turn!
Right. But sometimes we don't get a blue card at all. We're expressing the idea that it averages out.
We can only get more than one blue card if on an earlier turn, the dealer could have given us a blue card but decided not to, holding it back for later. So one amortized card is either a real card, or kind of like the phantom of a card that we'll get later.
Note that amortized time is a kind of “average” time. When we say our card game has one amortized blue card per turn, it doesn't mean that we'll only get one blue card, it means that on average we get one blue card per turn, an average with some very specific guarantees about the maximum number of blue cards that can have been handed out in total.
After \( m \) turns, you have \( m \) red cards because you get one red card per turn. Each red card is either
- Paired up with a blue card, or
- Unpaired (and you are never allowed to be given more blue cards than the number of unpaired red cards).
Thus, you have at most \( m \) blue cards, with every red card paired.
One way of thinking about the problem is to say that for each step you receive a red card, you are either dealt a blue card to go with it, or there's an undealt blue card that's been set aside to give you later (and will be that red card's eventual pairing partner). If you had to pay $1 every time you got dealt a blue card, one way to ensure you'll always have the right amount of money would be to set aside $1 each time you get a red card—the corresponding blue card to pair with it will be coming along a some point, and you might as well put the $1 aside for it now so you're ready if you suddenly get a bunch of blue cards at once. (FWIW, this financial analogy is where the term amortization comes from.)
I've been thinking about it, and I agree about our card game, but I don't think our in-order tree iterator is constant amortized time. It could require \( \Theta(n) \) at the first step, and in amortized constant time a one-step sequence must take constant time; it's only when we have multiple steps that we can have “saved up” some work (like our undealt blue cards).
You're right. It isn't.
Gah! Why did we spend all that time looking at it then?
It's an example where you can see operations that don't all cost the same and are \(\Theta(n)\) worst case, but if we traverse the entire tree, it's still \(\Theta(n)\). It's very similar to amortization, but amortization is a bit stronger in that we can't begin by doing something expensive. We have to have “saved up” some budgeted time before we can do that.
But there is an example that we've seen previously that is constant amortized time.
MORE examples!?!! Oh yeah! I'll put a down payment on that!
(When logged in, completion status appears here.)