Phase 4: insert()
, exists()
, and printToStream()
In this phase you will implement the core functionality of your HashSet
—insert()
and exists()
. When this phase is complete, you'll actually be able to use your HashSet
. You'll also write the printToStream()
function that can help you see what's going on inside your hash table.
For now, we won't be rehashing the table. Instead, users (and your tests) can specify a desired number of buckets to use for the hash table.
For reference:
User-Provided myhash()
Function
Recall from the specifications that your HashSet
should assume that, whatever the key type T
is, there is a function size_t myhash(const T&)
that returns the hash value of a key.
So when you need to get a hash value, you should simply call myhash()
on the key.
Note that myhash()
can (and, for a good hash function, should) use the full range of size_t
, so the hash values are very likely to be larger than the number of buckets.
Let's Go!
Also check out the Helpful Hints before you get going!
For this part, you will plan, write tests for, implement, and debug
insert()
exists()
printToStream()
Planning Image
Within the Planning
directory, add your planning image, named Phase_4.jpg
.
Core Functionality: insert()
and exists()
See the lesson materials from Week 12, Lesson 1 for how separate chaining works.
Displaying the Hash Table with printToStream()
Here's an example of the output from printing a 10-bucket hash table with 14 elements:
Is it a bug that bucket 7 is empty when other buckets have three items in them?
No, nothing's wrong. That's just random distribution across buckets in action. If they'd all ended up in one bucket, we might be suspicious about the hash function.
[0] bacon
[1] sausage, avocado
[2] waffles
[3] toast
[4] pancakes
[5] cheese
[6] juice, coffee, eggs
[7]
[8] tea, steak, tofu
[9] beans
The algorithm for displaying the hash table is as follows:
foreach bucket: // i.e., loop over all buckets in array-index order PRINT “[” PRINT bucket index PRINT “] ” // n.b., the trailing space is always present, even for empty buckets. PRINT each of the items in the bucket, separated by “, ” // i.e, a comma and a space. // Notice that there is no comma and space after the last item, only between items. PRINT a newline // i.e., “\n”.
Can my list of items end with a comma, as that would be easier to code?
No, commas separate items, so there is no comma after the last item.
There are many ways to code this with some degree of elegance. One option is to have a flag of some kind to know if you're print the first item or a subsequent item, but there are plenty of other ways to do it.
Once printToStream()
is written, you can print hash tables using the <<
operator as it just calls printToStream()
.
Helpful Hints
insert()
Only Inserts If Value Not Present
Recall from the specification that insert()
must enforce the uniqueness of keys. If it is given a key that is already in the table, it should not insert that key.
Strong Suggestion for insert()
Subsequent phases will be much easier if your insert()
function calls a private helper function to do the actual insertion after making sure the item isn't already in the table. This approach will give you a private insert()
function that inserts something into the hash table without checking whether it's already there. That function will come in useful later.
Also, insert()
should use push_front
to add new items onto the front of the bucket.
Pick a Simple Hash Function for Testing
For testing purposes, you get to pick the hash function! In hashset-test.cpp
, you can define myhash
for whatever type(s) you want to test with in the place marked by the comment.
Note that a good hash function gives random-looking values that appear to have no obvious connection to the hashed value. But testing is all about easy-to-follow behavior! So you may not want to use a good hash function when testing.
Instead, you may want to use simple hash functions that cause predictable behavior. We'll discuss this more when you implement the functions that measure the performance of the table.
For now, just know that you don't need to find a fancy, well-behaved hash function. Something very simple that would be a terrible choice for real usage is enough to test the relationships between insert
, exists
, and the other functions you've already implemented.
Define myhash
Before You Include HashSet
If you test with any of the built-in classes of C++ that live in the std::
namespace (such as std::string
), your myhash
function must be defined before you include hashset.hpp
. This requirement is a technical limitation: because myhash(string)
is in the top-level namespace, and std::string
is in the std
namespace, the compiler will not find myhash
if you define it after you define HashSet
.
We've left a comment in hashset-test.cpp
that shows where you should define your myhash
function(s).
Iterating Over Buckets and Template Errors
Template error messages can be pretty confusing. Do try to figure out what the compiler is saying (start from the first error message!!!), but if you have a hard time figuring out what the compiler is telling you, don't spend hours scratching your head: ask for help!
One common pitfall is failing to say typename
when necessary, which is a particular issue when we are using nested classes with templates, as well as template classes that use template classes. For example, suppose you had written the following templated top-level function to print out any list:
#include <iostream>
#include <forward_list>
#include <string>
using namespace std;
template <typename T>
void printAnyList(forward_list<T>& things){
for (forward_list<T>::iterator i = things.begin(); i != things.end(); ++i) {
cout << *i << endl;
}
}
This code should look clear and correct—to a human. But if you compile this code, you could be faced with the following weird error:
yuck.cpp:8:10: error: missing 'typename' prior to dependent type name 'forward_list<T>::iterator'
for (forward_list<T>::iterator i = things.begin(); i != things.end(); ++i)
^~~~~~~~~~~~~~~~~~~~~~~
typename
1 error generated.
What you actually need to do is promise that forward_list<T>::iterator
will be a type (once the list
template is instantiated) by prefixing the offending type with the keyword typename
, as in
for (typename forward_list<T>::iterator i = things.begin(); i != things.end(); ++i) {
cout << *i << endl;
}
This code works, but looks pretty ugly and long-winded. Remember, though, that if you're doing this kind of thing with a class, then your class can have a private type abbreviation (using
) to avoid typing this ugly code over-and-over again. You could also use auto i = things.begin();
to say that i
should be whatever type that can hold the result of things.begin()
.
Meh, I'm just gonna use
auto
.I'm going to use a range-based
for
loop—even simpler.But if you do that, be sure to use a constant reference for type variables to avoid needlessly making copies of items in the buckets.
Testing Using printToStream()
If you know the hash function, the number of buckets, and ensure that the final load factor is less than the maximum load factor, all hash-table implementations must print out a hash table that looks the same. So, for example, if you knew that your hash table should produce the output shown above, you could check that output against the string
"[0] bacon\n[1] sausage, avocado\n[2] waffles\n[3] toast\n[4] pancakes\n[5] cheese\n[6] juice, coffee, eggs\n[7] \n[8] tea, steak, tofu\n[9] beans\n"
That's a long line!
You can break a string in the middle by closing the quotes and opening them again on the next line.
clang-format
does this for you automatically!
Additional Sanity Check: hashset-cow-test.cpp
With these functions implemented, you should be able to successfully compile and run hashset-cow-test
.
It's a very simple test (it basically just inserts something and checks to see if it exists in the table). But the key is a Cow
class rather than a built-in type, so running this test checks to make sure that your table treats its keys properly. If your table performs operations on keys that it shouldn't, you'll probably get compiler errors.
(When logged in, completion status appears here.)