Comparisons: Space Efficiency

I'm still struggling to have a sense of the trade-offs between all the approaches.
There are a lot of trade-offs, for sure.
Here's another one. Sometimes, the issue isn't how fast the hash table is, it's whether you can fit all the data in memory at all.
For example, in her research work, Prof. Melissa needed to store a set of 13 billion 64-bit numbers. Memory efficiency really mattered in that particular example!

Space Usage

As we've already seen, we have to take care to compare like with like, so we'll examine how separate chaining compares with open addressing if we had only 1 MB of RAM we could use for data.

Our assumptions: In the examples below, we'll suppose a 64-bit system (with 8-byte pointers). We'll also suppose that a linked-list node requires 16 bytes when storing ints (due to padding added for alignment requirements), and 32 bytes when storing RGBcolor objects (no extra padding). We'll also imagine that heap allocations have zero additional overhead.

We'll look at two examples:

Storing 32-bit integers (4 bytes per value stored).
Storing RGBcolor values (3 doubles, 24 bytes per object).

Our goal here is just to get a flavor and intuition about the issues.
So we won't go down the rabbit hole of saying how these were calculated, but you should have enough information to check our work if you want.
Aww!

Separate Chaining in 1 MB of RAM

Storing 32-bit integers
- With \( \lambda = 0.5 \), we could store 32,768 integers.
- With \( \lambda = 1 \), we could store 43,690 integers.
- With \( \lambda = 2 \), we could store 52,428 integers.
- With \( \lambda = 8 \), we could store 61,680 integers.
Storing RGBcolor objects
- With \( \lambda = 0.5 \), we could store 21,845 colors.
- With \( \lambda = 1 \), we could store 26,214 colors.
- With \( \lambda = 2 \), we could store 29,127 colors.
- With \( \lambda = 8 \), we could store 31,775 colors.

Increasing the load factor made more of a difference for the 32-bit ints example.
From \( \lambda = 0.5 \) to \( \lambda = 8 \), it almost doubled the number of values we could store!
But for RGBcolor objects it made less of a difference because the array used by the hash table is a smaller proportion of total space usage.

Open Addressing in 1 MB of RAM

With open addressing, we have a choice we can make about what it is that we store in our buckets.
Specifically, do we store it the data itself, or do we store a pointer to the data?

We'll examine both choices. The lambda values here are equivalent to the ones we used above for separate chaining.

Storing 32-bit integers — Pointers to 32-bit ints in the table.
- With \( \lambda = 0.333 \), we could store 37,449 integers
- With \( \lambda = 0.5 \), we could store 52,428 integers.
- With \( \lambda = 0.667 \), we could store 65,536 integers.
- With \( \lambda = 0.889 \), we could store 80,659 integers.
Storing 32-bit integers — 32-bit ints in the table.
- With \( \lambda = 0.333 \), we could store 87,381 integers
- With \( \lambda = 0.5 \), we could store 131,072 integers.
- With \( \lambda = 0.667 \), we could store 174,762 integers.
- With \( \lambda = 0.889 \), we could store 233,016 integers.

We can store way more 32-bit ints when we store them directly in the table, rather than using a pointer.

Storing RGBcolor objects — Pointers to objects in the table
- With \( \lambda = 0.333 \), we could store 21,845 colors.
- With \( \lambda = 0.5 \), we could store 26,214 colors.
- With \( \lambda = 0.667 \), we could store 29,127 colors.
- With \( \lambda = 0.889 \), we could store 31,775 colors.
Storing RGBcolor objects — Objects directly in the table
- With \( \lambda = 0.333 \), we could store 14,563 colors.
- With \( \lambda = 0.5 \), we could store 28,145 colors.
- With \( \lambda = 0.667 \), we could store 29,127 colors.
- With \( \lambda = 0.889 \), we could store 38,836 colors.

It only becomes space efficient to store RGBcolor objects directly in the table when the load factor gets very high.
Wait, if we store pointers in the table, we can use nullptr for empty buckets, but if we store objects directly in the table, what value do we use to represent an empty bucket?
It is an added complication, for sure.
For now, we'll just leave it as something to think about.

Space Efficiency of Hash Tables vs. Trees

So how does this space efficiency compare to trees?
Let's work it out!

We can see that trees are very much in the same ballpark for RGBcolor objects.

For TreeSet<int>, the node size would be 32 bytes (due to padding), and we could store 32767 items. Here the tree isn't nearly as space efficient because the tree pointers dominate over the actual data stored.

But both trees and hash tables are \( \mathrm{O}(n) \) space and broadly comparable.

(When logged in, completion status appears here.)

Previous Page Next Page

CS 70

Warning: Missing Prerequisites

Comparisons: Space Efficiency

Space Usage

Separate Chaining in 1 MB of RAM

Open Addressing in 1 MB of RAM

Space Efficiency of Hash Tables vs. Trees

Instructor Controls