CS 70

HashSet<T> Specifications

This page describes the interface for the HashSet class template.

An instance of HashSet<T> will store values of the type T (as specified by the user) in a hash table that uses separate chaining and dynamic resizing.

Your implementation must support all the functionality described on this page, including the specified complexity requirements. You may not alter the interface (by changing names or adding/removing public functions).

You may add private helper member functions, but these are not part of the interface and not described here.

User-Provided myhash() Function

The HashSet class will assume that the user of the hash table has provided a myhash function for the type stored in the table. For example, if a uses wants to store Cow objects in a HashSet<Cow>, they will have also written a global function size_t myhash(const Cow& c) that provides hash values for Cows using the full range of the size_t type.

In other words, your HashSet should not implement its own hash function. When it needs to hash a key, it should simply call myhash() on that key, and assume that this function will be defined.

This design allows a user to define their own hash function for their particular problem and their own selected key type.

Allowable Key Types

You may not require anything of the type of keys stored in the hash table beyond the following:

  • The key type should have a copy constructor, assignment operator, and equality operator; and,
  • The user should have defined a myhash() function for that type (as described above).
  • If the user wants to print out the hash table, their type must also be printable.

In other words, your HashSet class template may not perform any operation on a key other than copy construction, assignment, equality comparison, myhash(), and (optionally), the << operator.

HashSet Interface

The class HashSet, on some type T (i.e., HashSet<T>) provides the following class-wide (a.k.a. static) compile-time (a.k.a. constexpr) constants:

  • DEFAULT_NUM_BUCKETS, a size_t specifying the default number of buckets to use if not specified by the user.
  • DEFAULT_MAX_LOAD_FACTOR, a double, specifying the maximum allowed load factor for the hash table (going beyond this value will trigger an expansion of the hash table to reduce it).

An instance of HashSet, on some type T (i.e., HashSet<T>), will provide the following operations:

  • An explicit parameterized constructor that:
    • Takes two arguments. Each argument has a default value that will be used if the argument is not specified.
      • a size_t indicating the initial number of buckets in the hash table.
        • Defaults to DEFAULT_NUM_BUCKETS.
      • a double indicating the maximum load factor.
        • Defaults to DEFAULT_MAX_LOAD_FACTOR.
    • Because the arguments are optional, this constructor is also the default constructor.
  • A swap function that swaps the contents of two HashSet<T> objects that
    • Takes one argument, the HashSet<T> to swap with;
    • Does not return a value;
    • Requires \(\Theta(1)\) time.
  • A destructor.
  • A size() member function that
    • Takes no arguments;
    • Returns the number of values in the hash table, as a size_t;
    • Requires \(\Theta(1)\) time;
    • Can be called on a read-only set (i.e., const at the end).
  • An insert() member function that
    • Takes a T by constant reference;
    • Doesn’t return anything (but inserts the item into the hash table);
      • It is okay to call insert with a T value that is already in the hash table. In that case the item is not inserted again.
    • Takes \( \Theta(1) \) amortized expected time under the probability model discussed in the lesson, with a \( \mathrm{O}(n) \) time bound.
  • An exists() member function that
    • Takes a T by constant reference;
    • Returns a bool indicating whether the item was found in the hash table;
    • Takes \( \Theta(1) \) amortized expected time under the probability model discussed in the lesson, with a \( \mathrm{O}(n) \) time bound;
    • Can be called on a read-only hash table (i.e., const at the end).
  • A printToStream() member function that
    • Takes an ostream&;
    • Prints the hash table out to that stream (the required format is descibed later);
    • Returns the same ostream& it was passed;
    • Can be called on a read-only hash table (i.e., const at the end).

In addition, HashSet should provide the following member functions to control the way the hash table stores its data.

  • A maxLoadFactor() member function that
    • Takes a double representing a new value for the maximum allowed load factor;
    • Does not return anything;
    • Takes \( \Theta(1) \) time. The hash table is not rehashed by this call.
  • A rehash() member function that
    • Does not take any arguments;
    • Does not return anything;
    • Reallocates the hash table (if necessary) such that the load factor now no more than half the maximum allowed load factor;
    • Takes \( \Theta(n) \) time.

In addition, HashSet should provide the following member functions, which will provide statistics about the hash table. These must all run in \( \Theta(1) \) time.

  • A buckets() member function that
    • Takes no arguments;
    • Returns the number of buckets in the hash table, as a size_t;
    • Takes \( \Theta(1) \) time.
  • A loadFactor() member function that
    • Takes no arguments;
    • Returns the current load factor of the hash table, as a double;
      • An empty hash table is defined to have a load factor of zero.
    • Takes \( \Theta(1) \) time.
  • A reallocations() member function that
    • Takes no arguments;
    • Returns the number of times the hash table has reallocated its underlying array, as a size_t;
      • The hash table must be reallocated when:
        • An item is inserted and would cause the load factor to go above the maximum load factor, or
      • A user calls rehash and the current load factor is more than half of the maximum load factor.
    • Takes \( \Theta(1) \) time.
  • A collisions() member function that
    • Takes no arguments;
    • Returns the number of times a key has been inserted into a non-empty bucket since the last time the table was reallocated, as a size_t;
      • An empty hash table is defined to have a load factor of zero.
    • Takes \( \Theta(1) \) time.
  • A maximal() member function that
    • Takes no arguments;
    • Returns the largest number of steps ever taken to find a key since the last time the table was reallocated, as a size_t;
      • The assignment description will give more details about this function.
    • Takes \( \Theta(1) \) time.
  • A showStatistics() member function that

    • Takes an ostream&;
    • Prints statistics about the hash to that stream, in the format

      n expansions, load factor f, c collisions, longest run l

      (where n, c, and l are integers, and f is a double), followed by a newline.

    • Returns the same ostream& it was passed.

    • Note that this function is provided.

You are not required to provide a copy constructor, an assignment operator, an operator== function, an iterator, or an erase() function. You are also not required to provide an iterator. You should disable the copy constructor and assignment operator in the class definition.

Top-Level Functions

You should also provide a top-level (not-member) function operator<< that defines how to print a HashSet<T>. It takes as arguments an ostream& and a const HashSet<T>&; it returns an ostream&. That operator will call the public printToStream() member function of the HashSet<T> class.

Private Encoding: Separate Chaining

We require your HashSet class template to use separate chaining as its encoding and to represent each bucket using a std::forward_list<T>, where T is the type of the keys.

The std::forward_list data structure is a bare-bones, singly linked list, which is more space efficient than a std::list as a std::list is doubly linked and stores its size as a member variable.

The buckets of the hash table are stored in a dynamically allocated array. Each element is a std::forward_list<T> embodying the chain that holds the keys in that bucket.

  • RHS Cow speaking

    Notice that your array elements are std::forward_list objects, not pointers to forward lists. In other words, you make the whole array of std::forward_list objects at once—you don't create std::forward_list objects individually and then store them in your array.

You will likely need other private member variables, but those are up to you.

(When logged in, completion status appears here.)