Heapify
What if I had a bunch of items already in an array and wanted to make it into a heap?
Any ideas?
We could go from front to back and let each item float up to the level where it belongs.
Oh, cool. That would essentially be like inserting each item one at a time.
Or… could we go from the back to the front and let each item sink to where it belongs in its subtree?
Sure! That would be sort of like the bottom of the tree is all heapish, and we keep sinking the things at the top into the lower heaps until its all one big heap.
But which one is better?
Hmm… let's check it out!
Min-Heapify (Rough) Analysis
Option 1 is something like
for (i = 0; i < n; ++i) FLOAT-UP(i)
In this version, when we're halfway through, the first half of the array will be a nicely made heap, but the second half will be a mess of unordered nodes.
Option 2 is something like this:
for (i = n − 1; i ≥ 0; −−i) SINK-DOWN(i)
In this version, when we're halfway through, the second half of the array will be a bunch of subheaps, but the first half will represent a part of the heap that only satisfies the structure property, not the order property.
Best-Case Analysis
In the best case, the array is already a heap! In that case, none of the nodes have to move, so FLOAT-UP / SINK-DOWN takes \( \Theta(1) \) time.
Thus we are doing \( n \times \Theta(1) \)-time operations: \( \Theta(n) \) time.
Worst-Case Analysis
Now let's imagine the worst case for each item.
- For option 1, that means floating all the way to the root.
- For option 2, that means sinking all the way to the bottom.
For nodes at different depths, float and sink take different amounts of effort.
- To float the root up to the top takes no swaps at all!
- To float a leaf up to the top takes \( \Theta( \log{n} ) \) swaps.
- To sink the root to the bottom takes \( \Theta( \log{n} ) \) swaps.
- To sink a leaf down to the bottom takes no swaps!
There are also different numbers of nodes at different depths.
- At depth 0 there is 1 node (the root).
- At depth 1 there are 2 nodes.
- At depth 2 there are 4 nodes.
- And so on…
Here, let's extrapolate a bit…
Depth | Num. nodes | Swaps to the top | Swaps to the bottom |
---|---|---|---|
\( 0 \) | \( 1 \) | \( 0 \) | \( \lfloor ( \log{n} ) \rfloor \) |
\( 1 \) | \( 2 \) | \( 1 \) | \( \lfloor ( \log{ ( n - 1 ) } ) \rfloor \) |
\( 2 \) | \( 4 \) | \( 2 \) | \( \lfloor ( \log{ ( n - 2 ) } ) \rfloor \) |
… | … | … | … |
\( \log{ ( n - 2 ) } \) | \( n/8 \) | \( \lfloor ( \log{ ( n - 2 ) } ) \rfloor \) | \(2\) |
\( \log{ ( n - 1 ) } \) | \( n/4 \) | \( \lfloor ( \log{ ( n - 1 ) } ) \rfloor \) | \( 1 \) |
\( \log{n} \) | \( n/2 \) | \( \lfloor ( \log{n} ) \rfloor \) | \( 0 \) |
So we just need to add up, for each level, how much effort each node takes times the number of nodes!
Would you rather have…
Option 1 (front to back)
$$ ( 0 \times 1 ) + ( 1 \times 2 ) + (2 \times 4 ) + \cdots{} + ( \lfloor \log{ ( n - 2 ) } \rfloor \times n/8 ) + ( \lfloor \log{ ( n - 1 ) } \rfloor \times n/4 ) + ( \lfloor \log{n} \rfloor \times n/2 ) $$
or
Option 2 (back to front)
$$ ( \lfloor ( \log{n} ) \rfloor \times 1 ) + ( \lfloor \log{ ( n - 1 ) } \rfloor \times 2 ) + ( \lfloor \log{ ( n - 2 ) } \rfloor \times 3 ) + \cdots{} + ( 2 \times n/8 ) + ( 1 \times n/4 ) + ( 0 \times n/2 )? $$
Roughly speaking, we can just compare the largest term in each, right?
- Option 1: The largest term is the last one: \( \lfloor \log{ n } \rfloor \times n/2 = \Theta( n \log{n} ) \).
- Option 2: The largest term is the second to last one: \( 1 \times n/4 = \Theta(n) \).
With option 2, we can make a heap out of arbitrary keys in \( \Theta(n) \) worst-case time!
Even though inserting them one at a time would take \( \Theta( n \log{n} ) \) in the worst case. Wow!
The key insight is that fully half of the tree is made of leaves.
So we should minimize the amount of work we do with the leaves. Hence, sinking everything down is better than floating everything up!
Isn't that amazing??
Meh. I'm not convinced. That was not a proof.
Well, good. You're right. That was not a proof.
It's the end of the semester. We are all tired. So we've spare you the fully detailed derivation of these complexities.
Fine with me!
But if you want to go down the rabbit hole though, you can ask one of the Profs in person. They'll be happy to show you a nifty proof.
If you keep studying CS, you'll probably encounter it at some point.
Conclusion
For MIN-HEAPIFY, we should do option 2 (back to front):
for (i = n − 1; i ≥ 0; −−i) SINK-DOWN(i)
The best-case complexity is: \( \Theta(n) \).
The worst-case complexity is: \( \Theta(n) \).
Since both the best- and worst-case times are both \( \Theta(n) \), the complexity of MIN-HEAPIFY is \( \Theta(n) \): linear time in all cases, never better, never worse.
(When logged in, completion status appears here.)