Comparisons: Space Efficiency
I'm still struggling to have a sense of the trade-offs between all the approaches.
There are a lot of trade-offs, for sure.
Here's another one. Sometimes, the issue isn't how fast the hash table is, it's whether you can fit all the data in memory at all.
For example, in her research work, Prof. Melissa needed to store a set of 13 billion 64-bit numbers. Memory efficiency really mattered in that particular example!
Space Usage
As we've already seen, we have to take care to compare like with like, so we'll examine how separate chaining compares with open addressing if we had only 1 MB of RAM we could use for data.
Our assumptions: In the examples below, we'll suppose a 64-bit system (with 8-byte pointers). We'll also suppose that a linked-list node requires 16 bytes when storing int
s (due to padding added for alignment requirements), and 32 bytes when storing RGBcolor
objects (no extra padding). We'll also imagine that heap allocations have zero additional overhead.
We'll look at two examples:
- Storing 32-bit integers (4 bytes per value stored).
- Storing
RGBcolor
values (3 doubles, 24 bytes per object).
Our goal here is just to get a flavor and intuition about the issues.
So we won't go down the rabbit hole of saying how these were calculated, but you should have enough information to check our work if you want.
Aww!
Separate Chaining in 1 MB of RAM
-
Storing 32-bit integers
- With \( \lambda = 0.5 \), we could store 32,768 integers.
- With \( \lambda = 1 \), we could store 43,690 integers.
- With \( \lambda = 2 \), we could store 52,428 integers.
- With \( \lambda = 8 \), we could store 61,680 integers.
-
Storing
RGBcolor
objects- With \( \lambda = 0.5 \), we could store 21,845 colors.
- With \( \lambda = 1 \), we could store 26,214 colors.
- With \( \lambda = 2 \), we could store 29,127 colors.
- With \( \lambda = 8 \), we could store 31,775 colors.
Increasing the load factor made more of a difference for the 32-bit
int
s example.From \( \lambda = 0.5 \) to \( \lambda = 8 \), it almost doubled the number of values we could store!
But for
RGBcolor
objects it made less of a difference because the array used by the hash table is a smaller proportion of total space usage.
Open Addressing in 1 MB of RAM
With open addressing, we have a choice we can make about what it is that we store in our buckets.
Specifically, do we store it the data itself, or do we store a pointer to the data?
We'll examine both choices. The lambda values here are equivalent to the ones we used above for separate chaining.
-
Storing 32-bit integers — Pointers to 32-bit
int
s in the table.- With \( \lambda = 0.333 \), we could store 37,449 integers
- With \( \lambda = 0.5 \), we could store 52,428 integers.
- With \( \lambda = 0.667 \), we could store 65,536 integers.
- With \( \lambda = 0.889 \), we could store 80,659 integers.
-
Storing 32-bit integers — 32-bit
int
s in the table.- With \( \lambda = 0.333 \), we could store 87,381 integers
- With \( \lambda = 0.5 \), we could store 131,072 integers.
- With \( \lambda = 0.667 \), we could store 174,762 integers.
- With \( \lambda = 0.889 \), we could store 233,016 integers.
We can store way more 32-bit
int
s when we store them directly in the table, rather than using a pointer.
-
Storing
RGBcolor
objects — Pointers to objects in the table- With \( \lambda = 0.333 \), we could store 21,845 colors.
- With \( \lambda = 0.5 \), we could store 26,214 colors.
- With \( \lambda = 0.667 \), we could store 29,127 colors.
- With \( \lambda = 0.889 \), we could store 31,775 colors.
-
Storing
RGBcolor
objects — Objects directly in the table- With \( \lambda = 0.333 \), we could store 14,563 colors.
- With \( \lambda = 0.5 \), we could store 28,145 colors.
- With \( \lambda = 0.667 \), we could store 29,127 colors.
- With \( \lambda = 0.889 \), we could store 38,836 colors.
It only becomes space efficient to store
RGBcolor
objects directly in the table when the load factor gets very high.Wait, if we store pointers in the table, we can use
nullptr
for empty buckets, but if we store objects directly in the table, what value do we use to represent an empty bucket?It is an added complication, for sure.
For now, we'll just leave it as something to think about.
Space Efficiency of Hash Tables vs. Trees
So how does this space efficiency compare to trees?
Let's work it out!
We can see that trees are very much in the same ballpark for RGBcolor
objects.
For TreeSet<int>
, the node size would be 32 bytes (due to padding), and we could store 32767 items. Here the tree isn't nearly as space efficient because the tree pointers dominate over the actual data stored.
But both trees and hash tables are \( \mathrm{O}(n) \) space and broadly comparable.
(When logged in, completion status appears here.)