CS 70

Comparisons: Space Efficiency

  • Cat speaking

    I'm still struggling to have a sense of the trade-offs between all the approaches.

  • LHS Cow speaking

    There are a lot of trade-offs, for sure.

  • RHS Cow speaking

    Here's another one. Sometimes, the issue isn't how fast the hash table is, it's whether you can fit all the data in memory at all.

  • Rabbit speaking

    For example, in her research work, Prof. Melissa needed to store a set of 13 billion 64-bit numbers. Memory efficiency really mattered in that particular example!

Space Usage

As we've already seen, we have to take care to compare like with like, so we'll examine how separate chaining compares with open addressing if we had only 1 MB of RAM we could use for data.

Our assumptions: In the examples below, we'll suppose a 64-bit system (with 8-byte pointers). We'll also suppose that a linked-list node requires 16 bytes when storing ints (due to padding added for alignment requirements), and 32 bytes when storing RGBcolor objects (no extra padding). We'll also imagine that heap allocations have zero additional overhead.

We'll look at two examples:

  • Storing 32-bit integers (4 bytes per value stored).
  • Storing RGBcolor values (3 doubles, 24 bytes per object).
  • LHS Cow speaking

    Our goal here is just to get a flavor and intuition about the issues.

  • RHS Cow speaking

    So we won't go down the rabbit hole of saying how these were calculated, but you should have enough information to check our work if you want.

  • Rabbit speaking

    Aww!

Separate Chaining in 1 MB of RAM

  • Storing 32-bit integers

    • With \( \lambda = 0.5 \), we could store 32,768 integers.
    • With \( \lambda = 1 \), we could store 43,690 integers.
    • With \( \lambda = 2 \), we could store 52,428 integers.
    • With \( \lambda = 8 \), we could store 61,680 integers.
  • Storing RGBcolor objects

    • With \( \lambda = 0.5 \), we could store 21,845 colors.
    • With \( \lambda = 1 \), we could store 26,214 colors.
    • With \( \lambda = 2 \), we could store 29,127 colors.
    • With \( \lambda = 8 \), we could store 31,775 colors.
  • LHS Cow speaking

    Increasing the load factor made more of a difference for the 32-bit ints example.

  • RHS Cow speaking

    From \( \lambda = 0.5 \) to \( \lambda = 8 \), it almost doubled the number of values we could store!

  • LHS Cow speaking

    But for RGBcolor objects it made less of a difference because the array used by the hash table is a smaller proportion of total space usage.

Open Addressing in 1 MB of RAM

  • RHS Cow speaking

    With open addressing, we have a choice we can make about what it is that we store in our buckets.

  • LHS Cow speaking

    Specifically, do we store it the data itself, or do we store a pointer to the data?

We'll examine both choices. The lambda values here are equivalent to the ones we used above for separate chaining.

  • Storing 32-bit integers — Pointers to 32-bit ints in the table.

    • With \( \lambda = 0.333 \), we could store 37,449 integers
    • With \( \lambda = 0.5 \), we could store 52,428 integers.
    • With \( \lambda = 0.667 \), we could store 65,536 integers.
    • With \( \lambda = 0.889 \), we could store 80,659 integers.
  • Storing 32-bit integers — 32-bit ints in the table.

    • With \( \lambda = 0.333 \), we could store 87,381 integers
    • With \( \lambda = 0.5 \), we could store 131,072 integers.
    • With \( \lambda = 0.667 \), we could store 174,762 integers.
    • With \( \lambda = 0.889 \), we could store 233,016 integers.
  • RHS Cow speaking

    We can store way more 32-bit ints when we store them directly in the table, rather than using a pointer.

  • Storing RGBcolor objects — Pointers to objects in the table

    • With \( \lambda = 0.333 \), we could store 21,845 colors.
    • With \( \lambda = 0.5 \), we could store 26,214 colors.
    • With \( \lambda = 0.667 \), we could store 29,127 colors.
    • With \( \lambda = 0.889 \), we could store 31,775 colors.
  • Storing RGBcolor objects — Objects directly in the table

    • With \( \lambda = 0.333 \), we could store 14,563 colors.
    • With \( \lambda = 0.5 \), we could store 28,145 colors.
    • With \( \lambda = 0.667 \), we could store 29,127 colors.
    • With \( \lambda = 0.889 \), we could store 38,836 colors.
  • LHS Cow speaking

    It only becomes space efficient to store RGBcolor objects directly in the table when the load factor gets very high.

  • Duck speaking

    Wait, if we store pointers in the table, we can use nullptr for empty buckets, but if we store objects directly in the table, what value do we use to represent an empty bucket?

  • LHS Cow speaking

    It is an added complication, for sure.

  • RHS Cow speaking

    For now, we'll just leave it as something to think about.

Space Efficiency of Hash Tables vs. Trees

  • Cat speaking

    So how does this space efficiency compare to trees?

  • LHS Cow speaking

    Let's work it out!

Consider a randomized BST that requires 24-bytes per node for the size and the left and right pointers, plus the size of the data; how many RGBColor objects could you store in 1 MiB (1,048,576 bytes)?

We can see that trees are very much in the same ballpark for RGBcolor objects.

For TreeSet<int>, the node size would be 32 bytes (due to padding), and we could store 32767 items. Here the tree isn't nearly as space efficient because the tree pointers dominate over the actual data stored.

But both trees and hash tables are \( \mathrm{O}(n) \) space and broadly comparable.

(When logged in, completion status appears here.)