CS 70

Key Points

  • A hash table is an array-based structure that stores keys in buckets.
    • A hash function takes a key and returns a number (the hash value).
    • The hash value is turned into a bucket index to where the key should go.
  • A collision is when two keys would be stored in the same bucket.
    • A perfect hash function prevents all collisions. In general, would need all keys in advance to create.
    • Two main collision resolution strategies:
      • Separate chaining: allow multiple keys in one bucket (see below).
      • Open addressing: allow keys to go to some other bucket (next lesson).
  • Separate chaining:
    • Each bucket has another container to hold keys (typically a linked list, hence "chain").
    • Lookup involves using the hash function to get the bucket, then searching the chain.
    • Insert involves a lookup (to ensure that the key is not already present), then insert into the chain.
  • Desirable hash function properties:
    • Deterministic: always gives the same output for the same input (otherwise we can't find keys again!)
      • It's okay to have different hash functions for different program runs (or even different hash tables) to reduce the chance someone could guess what our hash function will do. You can be deterministic while still being hard to predict.
    • Uniform: spreads keys roughly uniformly (patterns in input do not create patterns in output).
      • Avalanche effect: small differences in items cause large differences in keys
    • Efficient to compute: every operation involves the hash function!
  • Load factor:
    • \( \frac{N}{M} \), where \( N \) is the number of keys and \( M \) is the number of buckets.
    • Measures how full the table is.
    • An important quantity in the complexity analysis.
  • Complexity (separate chaining, dynamic size):
    • Lookup: expected \( \Theta(1) \); worst-case \( \Theta(n) \).
      • Expected comparisons for unsuccessful search: \( \lambda \).
      • Expected comparisons for successful search: \( 1 + \frac{\lambda}{2} + \frac{1}{2M} \approx 1 + \frac{\lambda}{2} \).
    • Insert: amortized expected \( \Theta(1) \); worst-case \( \Theta(n) \).
    • Note: expected times assume
      • An ideal hash function (keys are equally likely to be in any bucket).
      • Uniform random lookups (all keys are equally likely to be looked up).
      • Dynamically resized table (table capacity is doubled when \( \lambda \) reaches a constant threshold).
  • Hash tables do not guarantee constant time lookup!
    • They are amortized expected constant time!
      • If our assumptions hold, we can expect our expected performance (and issues such as everything landing in just a few buckets are vanishingly unlikely).
      • Even if our assumptions hold, inserts that cause the hash table to resize will take much longer than inserts that don't.

(When logged in, completion status appears here.)