CS 70

HEAP SORT

  • LHS Cow speaking

    Now that we know about binary heaps, we might as well use them for something!

  • RHS Cow speaking

    It turns out that you can use heap operations to make a cool sorting algorithm!

HEAP-SORT

Say we have an array arr of n arbitrary numbers that we want to sort.

Here is the HEAP-SORT algorithm:

HEAP-SORT(arr) algorithm:
    MAX-HEAPIFY(arr)          // Turn the array into a max heap (max at the root)
    for (i = 1; in; ++i)
        x = GET-MAX(arr)       // Remember this is a max heap
        POP-MAX(arr)
        arr[ni] = x               // This spot is no longer part of the heap

At the end of this algorithm, the array that used to hold the heap will be filled with the numbers in sorted order.

Here's a cool HEAP-SORT visualizer: https://www.cs.usfca.edu/~galles/visualization/HeapSort.html

HEAP-SORT Analysis

So how long does HEAP-SORT take?

The first step is MAX-HEAPIFY. We just decided that we can do that in \( \Theta(n) \) time.

Next we have \( n \) calls to POP-MAX. But every time we call it, the heap gets smaller!

So the first time takes \( \mathrm{O}( \log{n} ) \) time. The second time takes \( \mathrm{O}( \log{ ( n -1 ) } \)) time. Adding them up, we have…

\[ \mathrm{O}\bigl( \log{ n } + \log{ (n - 1 ) } + \log{ ( n - 2 ) } + \cdots{} + \log{ ( 2 ) } + \log{ ( 1 ) } \bigr). \]

Surely we can agree that every term in the sum is upper bounded by \( \log{n} \), right? So it's true that the complexity is

\[ \mathrm{O}\left( \log{n} + \log{n} + \cdots{} + \log{n} \right) = \mathrm{O}( n \log{n} ). \]

We might be worried that this is a loose bound, but it turns out that it's possible to prove that the worst-case complexity for any algorithm that sorts an arbitrary array of numbers is \( \Omega( n \log{n} ) \). That is, the worst-case complexity can't be better than linearithmic. So this bound is tight, and HEAP-SORT has the best possible worst-case time!

The best case for HEAP-SORT is when all the items are equal, in which case heapify still takes \( \Theta(n) \) time, but each POP-MAX call takes \( \Theta(1) \), resulting in \( \Theta(n) \) time overall. But this unusual case is actually the only way we can coax \( \Theta(n) \) performance out of HEAP-SORT, otherwise it's always \( \Theta(n \log n) \).

Running Code

  • Cat speaking

    I get the ideas, but it'd all be more concrete if I could see some running code.

  • Hedgehog speaking

    Yes, I think I might understand it better if I saw how you code some of these things.

  • Pig speaking

    MORE code!!!

  • LHS Cow speaking

    Okay.

A Heap<T> Data Structure

Here's a full implementation of a Heap<T> class.

  • Open Code in OnlineGDB
    • Check out the heap implementation in heap-private.hpp
    • Examine and run the example code that uses the heap in main.cpp.

The C++ Standard Library's Heap Facilities and Heap-Sorting

  • RHS Cow speaking

    And this code shows the heap-sort algorithm (using C++'s built-in heap-making facilities).

  • Rabbit speaking

    For best timing results with this code, click on the gear icon in the top corner and add -O3 to the compiler options. (That's the letter “O” for “optimize” not a zero there.)

This example code shows an in-place heap sort using C++'s heap facilities.

What observations do you have from looking at these two examples?

  • Goat speaking

    Meh. I'm kinda disappointed. HEAP-SORT is all like “I'm the fastest because I'm \( \Theta(n \log n) \)”, and then it's slower than std::sort. So I guess we know std::sort isn't heap sort and HEAP-SORT is garbage.

  • LHS Cow speaking

    We never said HEAP-SORT was always the fastest, just that it gives a good worst-case asymptotic guarantee.

  • Duck speaking

    What algorithm does std::sort use?

  • LHS Cow speaking

    It depends on the particular C++ standard library, but typically they use robust industrial-strength sorting algorithms, which are often a hybrid of multiple familiar algorithms, including HEAP-SORT.

  • Rabbit speaking

    If I were making std::sort I'd use pdqsort, and for std::stable_sort I'd use timsort. And this article has a great discussion of the engineering that goes into writing and maintaining std::sort.

  • LHS Cow speaking

    Okay… There's certainly lots down the “sorting algorithms” rabbit hole.

  • Hedgehog speaking

    What's the difference between std::sort and std::stable_sort?

  • LHS Cow speaking

    std::stable_sort leaves equal elements in the order it found them, whereas std::sort can leave them in a different order.

  • RHS Cow speaking

    So if you want to, say, a have list of students students sorted by dorm (most importantly), then class year, then name, you'd want to use std::stable_sort, and you'd first sort by name, then class year, then dorm. If you used std::sort you couldn't do that because each sort would totally destroy the order from the previous sort.

(When logged in, completion status appears here.)