IntVector
push_back()
In previous lessons and homeworks, we've seen code for an IntVector
class that creates an array-backed list. One of the operations this list type provides is push_back()
. Our list has both a size (the number of int
s it holds) and a capacity (the number of spaces it has in its array that could hold an int
). When all the capacity is used up, it has to reallocate the array to a larger sized one, which requires \( \Theta(n) \) time.
So the complexity of a single push_back()
operation is \( \Theta(1) \) time in the best case, and \( \Theta(n) \) time in the worst case.
In the “dynamic train” homework, you examined two possible reallocation strategies for expanding the size of the train (which is analogous to our array)—either expanding by a constant amount each time, or doubling the capacity. That assignment asked you to come up with an argument as to why doubling was better.
Yes, I remember. I didn't feel very equipped to do that, so my explanation was a bit awkward.
We wanted you to wish you had a good way to explain it so that when we got to this point in the course, you'd be eager to find out more.
I'm always eager for learning MORE!!
We're going to show that it's amortized constant time, right?
That's right.
First, let's get some practice running the code.
in a new browser window or tab to get a small program that performs \( m \) push_back()
operations on an empty IntVector
, using a version of our IntVector
code that has some extra debugging print statements and statistics gathering.
The main part of the code is this loop:
IntVector vec;
for (int i = 1; i <= numItems; ++i) {
vec.push_back(i);
}
Click the green
button at the top of the window, and you should see a line in the bottom input pane saying:How many items to push onto our array-backed list?
Click the arrows to expand the input pane so that it fills more of the screen, and then type 10
and look at the output.
The results should look something like this (but without the colors you see here):
How many items to push onto our array-backed list? 10 Okay, pushing 10 items push_back(1): + Added 1 in an empty slot (0 empty slots remaining) push_back(2): > Moved 1 to new array. + Added 2 in an empty slot (0 empty slots remaining) push_back(3): > Moved 1 to new array. > Moved 2 to new array. + Added 3 in an empty slot (1 empty slots remaining) push_back(4): + Added 4 in an empty slot (0 empty slots remaining) push_back(5): > Moved 1 to new array. > Moved 2 to new array. > Moved 3 to new array. > Moved 4 to new array. + Added 5 in an empty slot (3 empty slots remaining) push_back(6): + Added 6 in an empty slot (2 empty slots remaining) push_back(7): + Added 7 in an empty slot (1 empty slots remaining) push_back(8): + Added 8 in an empty slot (0 empty slots remaining) push_back(9): > Moved 1 to new array. > Moved 2 to new array. > Moved 3 to new array. > Moved 4 to new array. > Moved 5 to new array. > Moved 6 to new array. > Moved 7 to new array. > Moved 8 to new array. + Added 9 in an empty slot (7 empty slots remaining) push_back(10): + Added 10 in an empty slot (6 empty slots remaining) 10 pushes, 0 pops, 15 moves.
Click the
button again to try running the program with a different number of moves.Because we double the capacity of the array each time, and the capacities are 1, 2, 4, 8, 16, 32, 64, …, and resizes happen when we push 3, 5, 9. 17, 33, 65…, the worst case for amount of work done resizing will be just after we've resized.
Let's look at a table of the number of moves for different values of \( m \):
\( m \) | Worst-case moves |
---|---|
5 | 7 |
9 | 15 |
17 | 31 |
33 | 63 |
65 | 127 |
We can guess that it's \( 2(m - 1) - 1 \), which is \( 2m - 3 \). So, after \( m \) pushes we will have done \( m \) steps actually pushing new data (the orange steps in the trace above), and \( 2m - 3 \) steps copying data around (the blue and green steps in the trace).
Thus after \( m \) pushes, we will have done \( \Theta(m) \) + \( \Theta(m) \) work, which is \( \Theta(m) \).
Thus, a single push is \( \Theta(1) \) amortized time. That's it!
Wait, what? I wasn't expecting to be done so quickly.
Hey, enjoy it!
We didn't exactly prove our \( 2m - 3 \) formula, we just guessed it.
Okay, so we could be more rigorous there, but the main amortization argument is that we worked out what it would cost to do \( m \) operations from our starting point of an empty list, and then divided by \( m \) to get the amortized cost per operation.
So it really is just a kind of average cost?
Yes! It's an average with an ironclad worst-case guarantee, rather than a probabilistic sort of average.
But let's see if we can be a little more rigorous about the \( 2m - 3 \) formula for moves.
There are multiple ways of making the argument, but here's one: When push_back()
requires us to double the capacity, the number of items we want to have in our list is just one past a power of two, so \( n = 2^x + 1 \) for some \( x \) (I'm using \(n\) when thinking about the number of items in our list and \( m \) when thinking about the number of operations—in this case, because pushing increases the number of items, \( n = m \)). We'll have just copied \( 2^x \) array elements, but there were previous copies before that. So the copying cost is
\[ \sum_{i=0}^x 2^i = 2^{x + 1} - 1 = 2(2^x) - 1 = 2(n - 1) - 1 . \]
Was I supposed to know that \( \sum_{i=0}^x 2^i = 2^{x + 1} - 1 \)?
It is a useful summation identity, and you probably did already know it.
I did?
Imagine a concrete example, like showing that \( 8 + 4 + 2 + 1 = 16 - 1 \). Think of counting in binary. If we have 16, that's
10000
in binary, and if we subtract 1, we get01111
.That's cool!
Okay. I see. I sort of wanted to use the colors in the trace rather than summations. I could see a pattern and wanted to use that.
What's the pattern you see? Actually, wait, let's see what the students think…
In the trace above, moves are blue and green, whereas the actual pushes of the data are orange (you might also notice that blue moves are moves of recently added data, whereas green ones are data we've moved before). The way we've colored the operations in the trace above helps us see patterns and relationships.
We can match up each blue step with an (earlier) orange step. Similarly, we can match each green step with a blue step (and thus an orange one, too). Thus the number of blue and green steps cannot be larger than the number of orange steps.
Yes, that's basically what I saw. After a bit of muddle at the beginning, the orange steps were always followed by the same number of green steps and then the same number of blue steps.
Another way to look at it is to use money as a proxy for time and imagine that each step (regardless of color) costs $1, and we need to collect enough money at each push_back()
to pay for all the work. A constant fee of $3 per push_back()
has us covered: $1 for the orange step that's guaranteed to happen this time, and $2 to cover both a blue and green step that can happen later on.
Either of these arguments is sufficient to show that push_back
is a constant–amortized-time operation.
We can also use similar approaches to any of the ones we've seen on this page to show that pop_back
is a constant–amortized-time operation. If you'd like to have a go yourself, that's cool, but in the interests of time, we'll move on.
(When logged in, completion status appears here.)