HEAP SORT
Now that we know about binary heaps, we might as well use them for something!
It turns out that you can use heap operations to make a cool sorting algorithm!
HEAP-SORT
Say we have an array arr of n arbitrary numbers that we want to sort.
Here is the HEAP-SORT algorithm:
HEAP-SORT(arr) algorithm: MAX-HEAPIFY(arr) // Turn the array into a max heap (max at the root) for (i = 1; i ≤ n; ++i) x = GET-MAX(arr) // Remember this is a max heap POP-MAX(arr) arr[n − i] = x // This spot is no longer part of the heap
At the end of this algorithm, the array that used to hold the heap will be filled with the numbers in sorted order.
Here's a cool HEAP-SORT visualizer: https://www.cs.usfca.edu/~galles/visualization/HeapSort.html
HEAP-SORT Analysis
So how long does HEAP-SORT take?
The first step is MAX-HEAPIFY. We just decided that we can do that in \( \Theta(n) \) time.
Next we have \( n \) calls to POP-MAX. But every time we call it, the heap gets smaller!
So the first time takes \( \mathrm{O}( \log{n} ) \) time. The second time takes \( \mathrm{O}( \log{ ( n -1 ) } \)) time. Adding them up, we have…
\[ \mathrm{O}\bigl( \log{ n } + \log{ (n - 1 ) } + \log{ ( n - 2 ) } + \cdots{} + \log{ ( 2 ) } + \log{ ( 1 ) } \bigr). \]
Surely we can agree that every term in the sum is upper bounded by \( \log{n} \), right? So it's true that the complexity is
\[ \mathrm{O}\left( \log{n} + \log{n} + \cdots{} + \log{n} \right) = \mathrm{O}( n \log{n} ). \]
We might be worried that this is a loose bound, but it turns out that it's possible to prove that the worst-case complexity for any algorithm that sorts an arbitrary array of numbers is \( \Omega( n \log{n} ) \). That is, the worst-case complexity can't be better than linearithmic. So this bound is tight, and HEAP-SORT has the best possible worst-case time!
The best case for HEAP-SORT is when all the items are equal, in which case heapify still takes \( \Theta(n) \) time, but each POP-MAX call takes \( \Theta(1) \), resulting in \( \Theta(n) \) time overall. But this unusual case is actually the only way we can coax \( \Theta(n) \) performance out of HEAP-SORT, otherwise it's always \( \Theta(n \log n) \).
Running Code
I get the ideas, but it'd all be more concrete if I could see some running code.
Yes, I think I might understand it better if I saw how you code some of these things.
MORE code!!!
Okay.
A Heap<T>
Data Structure
Here's a full implementation of a Heap<T>
class.
- Open Code in OnlineGDB
- Check out the heap implementation in
heap-private.hpp
- Examine and run the example code that uses the heap in
main.cpp
.
- Check out the heap implementation in
The C++ Standard Library's Heap Facilities and Heap-Sorting
And this code shows the heap-sort algorithm (using C++'s built-in heap-making facilities).
For best timing results with this code, click on the gear icon in the top corner and add
-O3
to the compiler options. (That's the letter “O” for “optimize” not a zero there.)
This example code shows an in-place heap sort using C++'s heap facilities.
Meh. I'm kinda disappointed. HEAP-SORT is all like “I'm the fastest because I'm \( \Theta(n \log n) \)”, and then it's slower than
std::sort
. So I guess we knowstd::sort
isn't heap sort and HEAP-SORT is garbage.We never said HEAP-SORT was always the fastest, just that it gives a good worst-case asymptotic guarantee.
What algorithm does
std::sort
use?It depends on the particular C++ standard library, but typically they use robust industrial-strength sorting algorithms, which are often a hybrid of multiple familiar algorithms, including HEAP-SORT.
If I were making
std::sort
I'd use pdqsort, and forstd::stable_sort
I'd use timsort. And this article has a great discussion of the engineering that goes into writing and maintainingstd::sort
.Okay… There's certainly lots down the “sorting algorithms” rabbit hole.
What's the difference between
std::sort
andstd::stable_sort
?std::stable_sort
leaves equal elements in the order it found them, whereasstd::sort
can leave them in a different order.So if you want to, say, a have list of students students sorted by dorm (most importantly), then class year, then name, you'd want to use
std::stable_sort
, and you'd first sort by name, then class year, then dorm. If you usedstd::sort
you couldn't do that because each sort would totally destroy the order from the previous sort.
(When logged in, completion status appears here.)