Data Structure Scenario: Help Request System
Can we do another one? I feel like I need more practice.
Sure, let's do it!
A Help Request System
In this scenario, you've been tasked with writing a simple help request system for students waiting for grutor help.
A student can fill out a form to be added to a collection of students.
Whenever a grutor is available, they should be able to get information about the student in the collection who has been waiting the longest. They should also be able to remove that student from the collection.
It's also important that the website be able to display all the students in order of wait times, so people have a sense of where they are in line.
Which Interface?
The data-structure interfaces that we've discussed fall into some broad categories. Although different specific encodings vary a bit in what they can offer, at a high level we have
- Linear data structures (e.g., list, stack, queue)
- Good at keeping things in a particular order.
- Not always good at searching the data for a particular item.
- Associative data structures (e.g., set, map/dictionary)
- Good at searching the data for a particular item.
- Not always good at keeping track of the order of things.
- But that depends on which kind of associative data structure we use.
- Priority queue (kind of a category of its own)
- Good at finding the minimum item (or the maximum item if we set it up that way instead)
- Not good at searching the data for a particular item.
- Not good at keeping things in a particular order (although we can keep removing the minimum).
The main thing we have to do here is keep track of the order in which students ask for help. We want easy access to the item that was inserted longest ago. This seems like a job for a linear data structure, implementing a queue interface!
You might also have said priority queue, since we could think of the time when a student requested help as their priority and always get the one with the minimum timestamp. However, in that particular case a priority queue just becomes a regular queue, except it's unnecessarily complicated. A priority queue is a good choice when we want to organize items by some priority measure other than the order in which they arrived. In other words, a priority queue is good when planned removal order changes as new items get inserted.
We can also use an ordered associative container, such as a tree-based map, to keep track of the order in which students arrived. But, like choosing a heap, this is really overkill in this case. We don't need to be able to search for a particular student's timestamp, and we don't need to be able to remove a student other than the one who's been waiting the longest. We just need to be able to insert a new student and remove the one who's been waiting the longest. A queue is a better fit.
More Details
There are lots of things we might need to think about. Here are a few:
Do we need frequent insertion or removal?
In this example, yes, we do. The main operations are inserting and removing requests.
Do we need to insert/remove at the back or the front or both?
In this example we need to operate on both ends of the list. We can choose which way to do it, either
- Insert at the front, remove from the back; or
- Insert at the back and remove from the front.
How much variation in complexity can we tolerate?
We don't expect the queue to get super large so even some linear-time operations won't be a big deal. But we also don't want to be wasteful, and maybe we should keep scalability in mind in case we ever want to use our system for a much bigger class, like a Massive Open Online Course (MOOC), which could have many thousands of students and hundreds of tutors at a time.
How much space overhead can we tolerate?
Space overhead is probably not a huge concern. The items in the list will be pretty big in and of themselves. Information about a help request probably includes at least one student's name, how to get in touch with them, their question, things they've tried, and so on. That large collection of characters will probably swamp any reasonable amount of space overhead.
Do we need random access?
No, not really! We need fast access to the first student in the queue and fast insert to the end of the queue. We need to iterate over the queue to display it, but we don't need to directly access items at particular locations in the queue.
Brainstorming
Circular Array-Backed List?
- Pro: In the best case, can have zero space overhead per item.
- Meh: But in reality, must have some empty space to grow the list or accomadate shrinking, which adds some overhead.
- Meh: We can probably tolerate the occasional cost of resizing, even with thousands of items.
- Meh: Offers fast random access, but we don't really need that.
- Meh: Not exactly provided by the C++ standard library, but
std::deque
is essentially this idea with a few “optimizations”.- But the
IntVector
class you saw in HW5 is a straightforward circular list.
- But the
- Meh: Insertion outside of the front/back is expensive.
- But we don't need that operation here.
Normal Array-Backed List?
- Pro: In the best case, can have zero space overhead per item.
- Meh: But in reality, must have some empty space to grow the list or accomadate shrinking, which adds some overhead.
- Pro: In C++, we can base our code on
std::vector
. - Meh: We can probably tolerate the occasional cost of resizing, even with thousands of items.
- Meh: Offers fast random access, but we don't really need that.
- Con: Only insertion/removal at the back is efficient.
- So we can't efficiently make a queue!
Singly-Linked List?
- Pro: Fast insertion at both ends, fast removal at the front.
- Meh: Medium space overhead (one pointer per key), but we are not too concerned about that cost.
- Meh: No random access, but we don't really need that.
- Meh: In C++, we can't base our code on
std::forward_list
, because it doesn't support efficient insertion at the back.- But the
IntList
class you wrote in HW 5 had a back pointer, so it could do this, so you know it's possible, just not built in.
- But the
Doubly-Linked List?
- Pro: Fast insertion and removal at either end, and if we wanted we could also do fast insertion/removal in the middle.
- Pro: In C++, we can base our code on
std::list
. - Meh: Medium space overhead (two pointers per key), but we are not too concerned about that cost.
- Meh: No random access, but we don't really need that.
Is there a clear winner?
Well, at least there's a clear loser.
One of the key weaknesses of normal array-backed lists—slow insert/remove at the front—is a major issue for this application. But circular arrays fix that issue.
C++ has std::deque
and std::list
, and both can be used for queues. In fact, C++ provides std::queue
as a “container adapter” that can either use std::list
or std::deque
. It defaults to std::deque
, but many people have claimed that the committee made the wrong choice and std::queue
should have defaulted to using std::list
.
Based on what we've seen, I think we can say that both std::deque
and std::list
are fine with very subtle trade-offs between them.
(Also, remember that in HW5 we created our own IntVector
and IntList
classes, which are essentially circular array-backed lists and singly-linked lists (with back pointers), respectively.)
Why can't we have clear cut right answers?
Meh, that's life.
If we tilt the requirements slightly—saying, for example, that we don't want any variability—the answer does become clearer, favoring the list-based approach.
(When logged in, completion status appears here.)