CS 70

A Motivating Example: Network Routing

In a computer network, a router is a device with many incoming connections and many outgoing connections.

When computers on the network want to communicate, the sender transmits information to the router, which then decides where to send the information next, often to another router.

Eventually, after multiple “hops” along the network, the information is delivered to the recipient.

  • Dog speaking

    Am I in the right class? What day is this? How long was I asleep?!?

  • LHS Cow speaking

    Don't worry. You're in the right class.

  • RHS Cow speaking

    We're going to loop back around to binary-search trees in a minute.

  • LHS Cow speaking

    You don't need to memorize any of the details on this page. We are just trying to motivate our next data structure.

  • Goat speaking

    Meh. Wake me up later, then.

IP Addresses

A computer on the network is identified by its IP address (IP stands for Internet Protocol).

In the IPv4 standard, an address takes up 32 bits, and is written as four numbers separated by periods, each number representing the decimal value of 8 bits in the address.

For example, at time of writing, the HMC CS web server's IPv4 address was 134.173.42.2.

A 32-bit addressing space only allows 232= 4,294,967,296 (about 4 billion) unique addresses.

  • Dog speaking

    Wow, that's a lot!

  • LHS Cow speaking

    Not compared to the number of devices on the internet these days…

In the IPv6 standard, an address takes up 128 bits and is written as eight hexadecimal numbers separated by colons, each representing 16 bits of the address.

For example, at time of writing, the HMC CS web server's IPv6 address was 0000:0000:fe80:0000:0ec4:7aff:fe6a:e5aa, which can be shortened to fe80::ec4:7aff:fe6a:e5aa.

A 128-bit address space permits about 2128 (about 340 billion billion billion billion) unique addresses.

  • RHS Cow speaking

    Now that will probably hold us over for a while.

  • LHS Cow speaking

    To put this into perspective, if each computer weighed a mere 1 milligram, and we used the entire mass of the earth to make computers, we'd need about 50 million earths to make enough computers to use up the IPv6 address space. (Or 170 entire solar systems, if you prefer.)

Fun fact: if you type “What is my IP” into Google, it will tell you your IP address!

Routing Table

It wouldn't be feasible or sensible to try to directly connect every computer on the internet to every other computer. Instead computers are connected together in a network. And that's where routers come in. They're intermediaries that help get information from one computer to another.

When information comes in, destined for a particular IP address, the router needs to decide where to send it next, ideally so it gets closer to its destination.

A router stores a routing table which stores, for destinations it has seen, the address of the "next hop" on the path to the destination.

The entries in this table are updated using very cool distributed algorithms in which routers collaborate to determine the best paths through the network to each destination. We won't go into this here.

So a router needs to be able to take a destination IP address and quickly look up the address of the next hop in the table.

  • RHS Cow speaking

    How should we represent this table?

  • Dog speaking

    Easy! Let's make an array. We'll treat the IP address as a number and use it as an index. The array can store the next hop. We'll have an array entry for every IP address.

  • LHS Cow speaking

    Okay, good start, but…

What's wrong with Dog's suggestion?

The problem is that there are too many IP addresses!

Even with IPv4, are we really going to make every router store an array of 4.3 billion addresses?

And with IPv6, forget about it! We'd need to store an array of \( 3.4 \times 10^{38} \) addresses. The fundamental electronic component of computer chips (including memory chips) is the transitor. There have only been about \( 1.3 \times 10^{22} \) transistors ever made.

  • Dog speaking

    Okay, so I hear you saying that an array won't work.

  • LHS Cow speaking

    Yeah. There is literally not enough memory in the world to store the IPv6 address space in an array.

The good news is that any given router is very unlikely to encounter all possible destination addresses.

We don't need to make an array that can have an entry for every single address because most of that array would be empty!

So instead we need a data structure that only stores the entries it needs and that allows for efficient lookup.

  • Koala speaking

    Crikey, I reckon a binary-search tree could make a bonzer routing table!

  • LHS Cow speaking

    …I think that means it would be good?

  • Koala speaking

    Too right, mate. It'd be bloody ripper!

  • LHS Cow speaking

    Okay, yeah. Good idea!

A Red–Black tree would store only the entries it needs, and it would guarantee \( \mathrm{O}( \log{n} ) \) lookup and insert.

Still, there's something that's not ideal about a Red–Black tree for this purpose.

Locality of Reference

Information is transmitted on the network in small pieces called packets. Usually a message from one computer to another will consist of many packets.

Furthermore, it is common for two computers to exchange information back and forth for some time, like, for instance, a video game console interacting with an online game server.

This means that the router is likely to have to look up the same IP address many times. It also means that more recent IP addresses are more likely to be looked up again than ones that were seen a long time ago.

The term locality of reference refers to a tendency of a system to look up the same information repeatedly. In this case we specifically have temporal locality, the tendency for recent lookups to be repeated.

Are Red–Black trees a good choice when our application has temporal locality?

When does lookup take the longest in a red–black tree?

Where do newly added keys go in a red–black tree?

  • RHS Cow speaking

    Red–Black trees behave exactly opposite to what we want!

  • LHS Cow speaking

    The most recently added keys take the longest to look up.

To take advantage of the temporal locality in internet routing, we'd want a structure that makes it easiest to look up keys that were recently added or recently found.

That brings us to our next data structure: Splay Trees!

(When logged in, completion status appears here.)