What's an Encoding?
If the interface specifies what, the encoding (a.k.a. representation) specifies how.
The encoding of a data structure consists of the design decisions about how the data will be organized: What structures/variables will be needed to store data and any bookkeeping values? What are the roles/meanings of these variables? What relationships must always hold between the pieces?
In C++, the encoding is primarily expressed in the private members of the class that represents the data structure.
The encoding gives concrete goals for the implementation. By specifying the encoding, we can clearly determine what effects each function should have in order to support the interface.
Practice: STRING-LIST
For any given interface there are typically many plausible encodings! Here are a couple of possible encodings for STRING-LIST. Given what you've already seen in your homework and the previous lesson, the encodings we'll explore should be pretty familiar.
STRING-LIST Encoded using Arrays
We could store the contents of the list using an array.
- Let's imagine that the array is allocated with some initial size (called capacity).
- The first element of the list is placed at index 0 of the array, the second at index 1 of the array, and so on.
- If, at some point, there are more list elements than slots in the array, we'll have to create a new and bigger array and move everything to that new array.
Here is a schematic diagram of this encoding:
Terminology: This scheme is often called an array-backed list, but others call it a dynamic array, or a vector, depending on which community is talking about it.
STRING-LIST Encoded using a Linked List
We can also represent the list using a linked list, as you may have studied previously in CS 60 or 42.
- Here we have a set of list nodes, where each node contains one list element and a link to the next node in the list.
- The front node is called the head.
- The back node is called the tail.
Here is a schematic diagram of this encoding:
Can someone remind me what
nullptr
is?It's a C++ keyword for a special pointer value that doesn't point anywhere. (Follow this pointer to Pointers vs references in Week 4 to learn more.)
Here,
nullptr
is used to mark the end of the list.
Trade-Offs: Compare and Contrast
Why would we pick one or the other?
Good question! Every choice of encoding will have advantages and disadvantages.
Ultimately, the choice depends on matching the needs of your problem with the strengths of a particular choice of encoding.
Let's think a bit about the relative advantages and disadvantages of these two possibilities.
Key Points
- A linked list just needs to move a few pointers around to add a new node, which can be done in \(\Theta(1)\) time.
- An array-based list would need to shift every element over by one to make room for the new item, which requires \(\Theta(n)\) time, where \(n\) is the current size of the list!
Also, the array-backed list might need to resize the array.
Oh, that's true, too! But even if that happened, it would only take \(\Theta(n)\) time, so for this operation it would still come out to \(\Theta(n)\) overall.
Actually, the cost of those resize operations brings up some really interesting issues. We'll talk more about that at a later date.
Key Points
- The array-backed list simply has to index into the array at the appropriate spot. That's easy: \(\Theta(1)\)!
- The linked list would have to start at the head, and move to the requested location item-by-item. In the worst case we would need the final item (and thus have had to go throgh every item in the list), so the complexity is \(\mathrm{O}(n)\).
Both techniques require \(\mathrm{O}(n)\) time to perform an insert in general, but for very different reasons!
- An array-backed list can find the location for the insert easily (in \(\Theta(1)\) time). But it then needs to shift everything over to make room for the new item. In the worst case (inserting near the beginning) that would mean shifting all of the items over to make room. In the best case (inserting near the end) it doesn't shift anything over. So performance is no worse than linear time, but could be better. So that part takes \(\mathrm{O}(n)\) time. In total, that's \(\mathrm{O}(n)\) time.
- A linked list has to start at the head and follow links to find the location for the insert. In the worst case (inserting near the end) that might mean traversing the whole list. In the best case (inserting near the beginning) it doesn't need to search at all. So this part takes \(\mathrm{O}(n)\) time—linear time at worst, but could be better. Once the insertion location is found, actually inserting the new node can be done with just a few pointer reassignments, which just takes \(\Theta(1)\) time. In total, that's \(\mathrm{O}(n)\) time.
So it seems like array-backed lists are good at accessing an arbitrary location but bad at inserting into arbitrary locations.
Seems right to me.
And linked lists are good at inserting new items, but bad at getting to arbitrary locations.
Yup, that's a pretty good summary!
(When logged in, completion status appears here.)