A Motivating Example: Network Routing
In a computer network, a router is a device with many incoming connections and many outgoing connections.
When computers on the network want to communicate, the sender transmits information to the router, which then decides where to send the information next, often to another router.
Eventually, after multiple “hops” along the network, the information is delivered to the recipient.
Am I in the right class? What day is this? How long was I asleep?!?
Don't worry. You're in the right class.
We're going to loop back around to binary-search trees in a minute.
You don't need to memorize any of the details on this page. We are just trying to motivate our next data structure.
Meh. Wake me up later, then.
IP Addresses
A computer on the network is identified by its IP address (IP stands for Internet Protocol).
In the IPv4 standard, an address takes up 32 bits, and is written as four numbers separated by periods, each number representing the decimal value of 8 bits in the address.
For example, at time of writing, the HMC CS web server's IPv4 address was 134.173.42.2.
A 32-bit addressing space only allows 232= 4,294,967,296 (about 4 billion) unique addresses.
Wow, that's a lot!
Not compared to the number of devices on the internet these days…
In the IPv6 standard, an address takes up 128 bits and is written as eight hexadecimal numbers separated by colons, each representing 16 bits of the address.
For example, at time of writing, the HMC CS web server's IPv6 address was 0000:0000:fe80:0000:0ec4:7aff:fe6a:e5aa
, which can be shortened to fe80::ec4:7aff:fe6a:e5aa
.
A 128-bit address space permits about 2128 (about 340 billion billion billion billion) unique addresses.
Now that will probably hold us over for a while.
To put this into perspective, if each computer weighed a mere 1 milligram, and we used the entire mass of the earth to make computers, we'd need about 50 million earths to make enough computers to use up the IPv6 address space. (Or 170 entire solar systems, if you prefer.)
Fun fact: if you type “What is my IP” into Google, it will tell you your IP address!
Routing Table
It wouldn't be feasible or sensible to try to directly connect every computer on the internet to every other computer. Instead computers are connected together in a network. And that's where routers come in. They're intermediaries that help get information from one computer to another.
When information comes in, destined for a particular IP address, the router needs to decide where to send it next, ideally so it gets closer to its destination.
A router stores a routing table which stores, for destinations it has seen, the address of the "next hop" on the path to the destination.
The entries in this table are updated using very cool distributed algorithms in which routers collaborate to determine the best paths through the network to each destination. We won't go into this here.
So a router needs to be able to take a destination IP address and quickly look up the address of the next hop in the table.
How should we represent this table?
Easy! Let's make an array. We'll treat the IP address as a number and use it as an index. The array can store the next hop. We'll have an array entry for every IP address.
Okay, good start, but…
The problem is that there are too many IP addresses!
Even with IPv4, are we really going to make every router store an array of 4.3 billion addresses?
And with IPv6, forget about it! We'd need to store an array of \( 3.4 \times 10^{38} \) addresses. The fundamental electronic component of computer chips (including memory chips) is the transitor. There have only been about \( 1.3 \times 10^{22} \) transistors ever made.
Okay, so I hear you saying that an array won't work.
Yeah. There is literally not enough memory in the world to store the IPv6 address space in an array.
The good news is that any given router is very unlikely to encounter all possible destination addresses.
We don't need to make an array that can have an entry for every single address because most of that array would be empty!
So instead we need a data structure that only stores the entries it needs and that allows for efficient lookup.
Crikey, I reckon a binary-search tree could make a bonzer routing table!
…I think that means it would be good?
Too right, mate. It'd be bloody ripper!
Okay, yeah. Good idea!
A Red–Black tree would store only the entries it needs, and it would guarantee \( \mathrm{O}( \log{n} ) \) lookup and insert.
Still, there's something that's not ideal about a Red–Black tree for this purpose.
Locality of Reference
Information is transmitted on the network in small pieces called packets. Usually a message from one computer to another will consist of many packets.
Furthermore, it is common for two computers to exchange information back and forth for some time, like, for instance, a video game console interacting with an online game server.
This means that the router is likely to have to look up the same IP address many times. It also means that more recent IP addresses are more likely to be looked up again than ones that were seen a long time ago.
The term locality of reference refers to a tendency of a system to look up the same information repeatedly. In this case we specifically have temporal locality, the tendency for recent lookups to be repeated.
Are Red–Black trees a good choice when our application has temporal locality?
Red–Black trees behave exactly opposite to what we want!
The most recently added keys take the longest to look up.
To take advantage of the temporal locality in internet routing, we'd want a structure that makes it easiest to look up keys that were recently added or recently found.
That brings us to our next data structure: Splay Trees!
(When logged in, completion status appears here.)