CS 70

What's an Encoding?

If the interface specifies what, the encoding (a.k.a. representation) specifies how.

The encoding of a data structure consists of the design decisions about how the data will be organized: What structures/variables will be needed to store data and any bookkeeping values? What are the roles/meanings of these variables? What relationships must always hold between the pieces?

In C++, the encoding is primarily expressed in the private members of the class that represents the data structure.

The encoding gives concrete goals for the implementation. By specifying the encoding, we can clearly determine what effects each function should have in order to support the interface.

Practice: STRING-LIST

For any given interface there are typically many plausible encodings! Here are a couple of possible encodings for STRING-LIST. Given what you've already seen in your homework and the previous lesson, the encodings we'll explore should be pretty familiar.

STRING-LIST Encoded using Arrays

We could store the contents of the list using an array.

  • Let's imagine that the array is allocated with some initial size (called capacity).
  • The first element of the list is placed at index 0 of the array, the second at index 1 of the array, and so on.
  • If, at some point, there are more list elements than slots in the array, we'll have to create a new and bigger array and move everything to that new array.

Here is a schematic diagram of this encoding:

Terminology: This scheme is often called an array-backed list, but others call it a dynamic array, or a vector, depending on which community is talking about it.

STRING-LIST Encoded using a Linked List

We can also represent the list using a linked list, as you may have studied previously in CS 60 or 42.

  • Here we have a set of list nodes, where each node contains one list element and a link to the next node in the list.
  • The front node is called the head.
  • The back node is called the tail.

Here is a schematic diagram of this encoding:

  • Sheep speaking

    Can someone remind me what nullptr is?

  • LHS Cow speaking

    It's a C++ keyword for a special pointer value that doesn't point anywhere. (Follow this pointer to Pointers vs references in Week 4 to learn more.)

  • RHS Cow speaking

    Here, nullptr is used to mark the end of the list.

Trade-Offs: Compare and Contrast

  • Duck speaking

    Why would we pick one or the other?

  • LHS Cow speaking

    Good question! Every choice of encoding will have advantages and disadvantages.

  • RHS Cow speaking

    Ultimately, the choice depends on matching the needs of your problem with the strengths of a particular choice of encoding.

Let's think a bit about the relative advantages and disadvantages of these two possibilities.

Which of the two encodings would offer a more efficient PUSH-FRONT operation?

Key Points

  • A linked list just needs to move a few pointers around to add a new node, which can be done in \(\Theta(1)\) time.
  • An array-based list would need to shift every element over by one to make room for the new item, which requires \(\Theta(n)\) time, where \(n\) is the current size of the list!
  • Cat speaking

    Also, the array-backed list might need to resize the array.

  • LHS Cow speaking

    Oh, that's true, too! But even if that happened, it would only take \(\Theta(n)\) time, so for this operation it would still come out to \(\Theta(n)\) overall.

  • RHS Cow speaking

    Actually, the cost of those resize operations brings up some really interesting issues. We'll talk more about that at a later date.

Which of the two encodings would offer a more efficient GET-ITEM-AT operation?

Key Points

  • The array-backed list simply has to index into the array at the appropriate spot. That's easy: \(\Theta(1)\)!
  • The linked list would have to start at the head, and move to the requested location item-by-item. In the worst case we would need the final item (and thus have had to go throgh every item in the list), so the complexity is \(\mathrm{O}(n)\).

Which of the two encodings would offer a more efficient INSERT operation? (Where INSERT adds an item at a specific index.)

Both techniques require \(\mathrm{O}(n)\) time to perform an insert in general, but for very different reasons!

  • An array-backed list can find the location for the insert easily (in \(\Theta(1)\) time). But it then needs to shift everything over to make room for the new item. In the worst case (inserting near the beginning) that would mean shifting all of the items over to make room. In the best case (inserting near the end) it doesn't shift anything over. So performance is no worse than linear time, but could be better. So that part takes \(\mathrm{O}(n)\) time. In total, that's \(\mathrm{O}(n)\) time.
  • A linked list has to start at the head and follow links to find the location for the insert. In the worst case (inserting near the end) that might mean traversing the whole list. In the best case (inserting near the beginning) it doesn't need to search at all. So this part takes \(\mathrm{O}(n)\) time—linear time at worst, but could be better. Once the insertion location is found, actually inserting the new node can be done with just a few pointer reassignments, which just takes \(\Theta(1)\) time. In total, that's \(\mathrm{O}(n)\) time.
  • Cat speaking

    So it seems like array-backed lists are good at accessing an arbitrary location but bad at inserting into arbitrary locations.

  • LHS Cow speaking

    Seems right to me.

  • Cat speaking

    And linked lists are good at inserting new items, but bad at getting to arbitrary locations.

  • LHS Cow speaking

    Yup, that's a pretty good summary!

(When logged in, completion status appears here.)