CS 70

Phase 4: insert(), exists(), and printToStream()

In this phase you will implement the core functionality of your HashSetinsert() and exists(). When this phase is complete, you'll actually be able to use your HashSet. You'll also write the printToStream() function that can help you see what's going on inside your hash table.

For now, we won't be rehashing the table. Instead, users (and your tests) can specify a desired number of buckets to use for the hash table.


For reference:

User-Provided myhash() Function

Recall from the specifications that your HashSet should assume that, whatever the key type T is, there is a function size_t myhash(const T&) that returns the hash value of a key.

So when you need to get a hash value, you should simply call myhash() on the key.

Note that myhash() can (and, for a good hash function, should) use the full range of size_t, so the hash values are very likely to be larger than the number of buckets.

Let's Go!

For this part, you will plan, write tests for, implement, and debug

  • insert()
  • exists()
  • printToStream()

Planning Image

Within the Planning directory, add your planning image, named Phase_4.jpg.

Core Functionality: insert() and exists()

See the lesson materials from Week 12, Lesson 1 for how separate chaining works.

Displaying the Hash Table with printToStream()

Here's an example of the output from printing a 10-bucket hash table with 14 elements:

  • Duck speaking

    Is it a bug that bucket 7 is empty when other buckets have three items in them?

  • LHS Cow speaking

    No, nothing's wrong. That's just random distribution across buckets in action. If they'd all ended up in one bucket, we might be suspicious about the hash function.

[0] bacon
[1] sausage, avocado
[2] waffles
[3] toast
[4] pancakes
[5] cheese
[6] juice, coffee, eggs
[7]
[8] tea, steak, tofu
[9] beans

The algorithm for displaying the hash table is as follows:

foreach bucket:  // i.e., loop over all buckets in array-index order
      PRINT[”
      PRINT bucket index
      PRINT] // n.b., the trailing space is always present, even for empty buckets.
      PRINT each of the items in the bucket, separated by, // i.e, a comma and a space.
      // Notice that there is no comma and space after the last item, only between items.
      PRINT a newline      // i.e., “\n”.
  • Goat speaking

    Can my list of items end with a comma, as that would be easier to code?

  • LHS Cow speaking

    No, commas separate items, so there is no comma after the last item.

  • RHS Cow speaking

    There are many ways to code this with some degree of elegance. One option is to have a flag of some kind to know if you're print the first item or a subsequent item, but there are plenty of other ways to do it.

Once printToStream() is written, you can print hash tables using the << operator as it just calls printToStream().

Helpful Hints

insert() Only Inserts If Value Not Present

Recall from the specification that insert() must enforce the uniqueness of keys. If it is given a key that is already in the table, it should not insert that key.

Strong Suggestion for insert()

Subsequent phases will be much easier if your insert() function calls a private helper function to do the actual insertion after making sure the item isn't already in the table. This approach will give you a private insert() function that inserts something into the hash table without checking whether it's already there. That function will come in useful later.

Also, insert() should use push_front to add new items onto the front of the bucket.


Pick a Simple Hash Function for Testing

For testing purposes, you get to pick the hash function! In hashset-test.cpp, you can define myhash for whatever type(s) you want to test with in the place marked by the comment.

Note that a good hash function gives random-looking values that appear to have no obvious connection to the hashed value. But testing is all about easy-to-follow behavior! So you may not want to use a good hash function when testing.

Instead, you may want to use simple hash functions that cause predictable behavior. We'll discuss this more when you implement the functions that measure the performance of the table.

For now, just know that you don't need to find a fancy, well-behaved hash function. Something very simple that would be a terrible choice for real usage is enough to test the relationships between insert, exists, and the other functions you've already implemented.

Define myhash Before You Include HashSet

If you test with any of the built-in classes of C++ that live in the std:: namespace (such as std::string), your myhash function must be defined before you include hashset.hpp. This requirement is a technical limitation: because myhash(string) is in the top-level namespace, and std::string is in the std namespace, the compiler will not find myhash if you define it after you define HashSet.

We've left a comment in hashset-test.cpp that shows where you should define your myhash function(s).


Iterating Over Buckets and Template Errors

Template error messages can be pretty confusing. Do try to figure out what the compiler is saying (start from the first error message!!!), but if you have a hard time figuring out what the compiler is telling you, don't spend hours scratching your head: ask for help!

One common pitfall is failing to say typename when necessary, which is a particular issue when we are using nested classes with templates, as well as template classes that use template classes. For example, suppose you had written the following templated top-level function to print out any list:

#include <iostream>
#include <forward_list>
#include <string>

using namespace std;

template <typename T>
void printAnyList(forward_list<T>& things){
    for (forward_list<T>::iterator i = things.begin(); i != things.end(); ++i) {
        cout << *i << endl;
    }
}

This code should look clear and correct—to a human. But if you compile this code, you could be faced with the following weird error:

yuck.cpp:8:10: error: missing 'typename' prior to dependent type name 'forward_list<T>::iterator'
    for (forward_list<T>::iterator i = things.begin(); i != things.end(); ++i)
         ^~~~~~~~~~~~~~~~~~~~~~~
         typename
1 error generated.

What you actually need to do is promise that forward_list<T>::iterator will be a type (once the list template is instantiated) by prefixing the offending type with the keyword typename, as in

for (typename forward_list<T>::iterator i = things.begin(); i != things.end(); ++i) {
    cout << *i << endl;
}

This code works, but looks pretty ugly and long-winded. Remember, though, that if you're doing this kind of thing with a class, then your class can have a private type abbreviation (using) to avoid typing this ugly code over-and-over again. You could also use auto i = things.begin(); to say that i should be whatever type that can hold the result of things.begin().

  • Goat speaking

    Meh, I'm just gonna use auto.

  • Cat speaking

    I'm going to use a range-based for loop—even simpler.

  • LHS Cow speaking

    But if you do that, be sure to use a constant reference for type variables to avoid needlessly making copies of items in the buckets.


Testing Using printToStream()

If you know the hash function, the number of buckets, and ensure that the final load factor is less than the maximum load factor, all hash-table implementations must print out a hash table that looks the same. So, for example, if you knew that your hash table should produce the output shown above, you could check that output against the string

"[0] bacon\n[1] sausage, avocado\n[2] waffles\n[3] toast\n[4] pancakes\n[5] cheese\n[6] juice, coffee, eggs\n[7] \n[8] tea, steak, tofu\n[9] beans\n"
  • Hedgehog speaking

    That's a long line!

  • LHS Cow speaking

    You can break a string in the middle by closing the quotes and opening them again on the next line.

  • Rabbit speaking

    clang-format does this for you automatically!


Additional Sanity Check: hashset-cow-test.cpp

With these functions implemented, you should be able to successfully compile and run hashset-cow-test.

It's a very simple test (it basically just inserts something and checks to see if it exists in the table). But the key is a Cow class rather than a built-in type, so running this test checks to make sure that your table treats its keys properly. If your table performs operations on keys that it shouldn't, you'll probably get compiler errors.

To Complete This Part of the Assignment…

You'll know you're done with this part of the assignment when you've done all of the following:

(When logged in, completion status appears here.)