CS 70

Lesson Pages Summary

If you want to know which lesson covered a given topic, this page may help you find it.

For each lesson of each week, we've reproduced the final “Key Points” page from that lesson, and then provided links back to the original lesson pages for specific topics.

Week 1

Lesson 1 — Welcome, Variables, and Memory Diagrams

Key Points

Class Format

  • Each week you'll have lesson material to go through at your own pace.
  • In class, you'll work on homework and check in with the professors.
  • During each class meeting, fill out a check in form to get credit for the day.
  • You may attend class on Zoom if it's best if you don't attend in person (e.g., if you have symptoms of illness).

Homework

  • Except for the first one, you'll have a partner for all homework assignments.
  • Each partnership will have one 24-hour extension to apply to a homework deadline.
  • We want you to support and learn from your classmates: be careful to preserve each other's learning processes.

Proficiency Checks

  • Each week there are four proficiency checks, each focused on a single learning objective.
  • Take proficiency checks in exam-like conditions.
  • There will be opportunities to retry proficiency checks where you did not demonstrate proficiency the first time.
  • During finals week there will be a final try at all remaining proficiency checks (with the possibility of partial credit).

Getting the Most Out of CS 70

  • We are here to help! Please reach out to the course staff—and to each other—early and often!

CS 70 Memory Diagrams

  • We use memory diagrams to help trace the correctness of C++ code.
  • In our diagrams, we draw a variable's
    • value "inside the box" since it exists at runtime.
    • name, type, and location "outside the box" because they are useful for tracing the code, but they are not explicitly represented in memory at runtime.
  • Our model abstracts away other details of specific computers (such as registers, the specific size of different numeric types, and the real naming conventions for memory slots) because those details vary from one computer to another, and are not helpful for modeling correctness.

C++ Guiding Principles

  • Programming languages are designed by particular people, for particular purposes, in particular contexts.
  • We discussed two principles that are important for explaining some of the design choices in C++ and let us reason about how the language behaves.
    • The Zero-Overhead Principle: Don't slow programs down “needlessly”.
      • Providing a lanuage feature should never slow down a program that doesn't use that feature.
      • Generally do things in the quickest possible way (even at the expense of safety or convenience).
    • The “Power to the People” Principle: Provide language features that allow expert programmers to do things in unsual ways.
      • Sometimes beginners accidentally use featues intended for experts!
  • Both of these features mean we should not ignore compiler warnings when it notices we're doing something questionable!

Lesson 2 — Object Lifetime, Numeric Types, Promotion and Conversion

Key Points

Object Lifetime

  • Every variable goes through the same 5 phases of object lifetime.
  • The space for a local variable is allocated at the opening curly brace of the function where it's declared.
  • The value of a variable is initialized on the line with its declaration.
  • Once a variable's value is initialized, it is in use. For primitive values, that primarily means that its name is in scope, which means the name can be used in other expressions.
  • The value of a variable is destroyed at the closing curly brace of the declaring block.
  • Once a variable's value is destroyed, it is no longer in use. For primitive values, that primarily means that its name is out of scope, which means that the name cannot be used in other expressions.
  • The space for a local variable is deallocated at the closing curly brace of the function where it's declared.

Numeric Types

  • Numeric values in C++ can either be floating point (able to hold non-integer parts, like decimals) or integers (only able to hold integer values).
  • Integer values in C++ can either be signed (able to hold positive and negative values) or unsigned (only able to hold non-negative values).
  • Types like booleans and characters are numeric types in C++. A boolean, for example, is an integer type that is guaranteed to be able to hold two values.

Numeric Promotion and Conversion

  • If a floating point is used where an integer type is needed, that is always conversion since the non-integer part of the value may be lost.
  • If an integer type is used where a floating point is needed, that is always conversion since there may not be enough bits allocated to storing the integer part of the value.
  • If a signed value is used where unsigned type is needed, that is always conversion since negative values cannot be correctly represented.
  • If an unsigned value is used as a signed type of the same size is needed, that is always conversion since there may not be enough bits allocated to storing the magnitude of the value.
  • If a value from a larger numeric type is used where a smaller numeric type is needed, that is always conversion since there may not be enough bits to store the value.
  • If a value from a smaller numeric type is used as a larger numeric type of the same signedness is needed, that is always promotion since there will definitely be enough bits to store the value.

Week 2

Lesson 1 — Compilation, Version Control, and Pair Programming

Key Points

Compilation

  • All of the C++ code we write will go through the following steps:
    • First, the compiler changes source code in to assembly code.
    • Next, the assembler changes the assembly code into object code.
    • Finally, the linker changes the object code into machine code.
  • The compiler deals with all of the parts of variables that are not actually stored in memory at run time.
    • Specifically, that means that errors related to a variable's type or name will be caught at compile time!
  • The (simplest) commands to compile a source file myprogram.cpp into an executable named myprogram and then run that program are
    • clang++ -c myprogram.cpp (compilation and assembly)
    • clang++ myprogram.o -o myprogram (linking)
    • ./myprogram (running the program)
  • (In the homework we will typically have additional command-line options.)

Version Control

  • A version control tool lets us keep track of all the different versions of our files, without having them clutter up our workspace.
  • The version-control tool you will use this semester is called git.
  • We also use a website called GitHub to have a shared, web-accessible place to put copies of everyone's repositories.

Pair Programming

  • Pair programming is a skill. Like any skill, it requires practice and intention to get better.
  • Our job this semester, collectively, is to support each other in developing that skill!
  • What is Compilation?
    • What is compilation
    • Stages of compilation (preprocessing, compilation, assembly, linking)
    • Compiler translates high-level code to low-level machine instructions
    • Brief history of C/C++ and Unix
  • Compilation Commands
    • Compilation commands to generate object file and executable
    • clang++ options for compiling and linking
    • Running the executable
  • Compiler Responsibilities
    • Compiler handles types
    • Compiler handles names
    • Compiler checks syntax
  • Version Control
    • Version control systems
    • git and GitHub
    • Tracking file changes
  • Thinking About Pair Programming
    • Pair programming skills and mindset
    • Academic integrity and pair programming

Lesson 2 — Arrays, Style, and Program Correctness

Key Points

Arrays of Primitives

  • Arrays are
    • Fixed-size — Once they're created, we can't change their size.
    • Homogenous — All of the elements in them have to be the same type.
    • Contiguous — If we looked at an array in memory, we'd see all of the elements side by side.
    • Ordered — We can talk about what the first, third, or forty-second element of the array is.
  • Arrays allow constant-time access: We can jump straight to an arbitrary element of the array.
  • In terms of object lifetime, arrays act just like individual variables. The compiler needs to know ahead of time how big the array will be, so that it can write the instructions to allocate enough space to store the array.
    • Declare and initialize an array: int arr[3]{1, 2, 3};
    • Usage: To access the item at index i: arr[i]
      • arr[i] is translated to *(arr + i)
      • *x means the value in memory at the address given by x
      • arr + i means the address at an offset of i from the address given by arr
      • So *(arr + i) means the value in memory at the address of the item at offset i from the start of the array.
    • Destruction and deallocation:
      • When the array is destroyed, each item is destroyed.
      • When the array is deallocated, all items are deallocated.
  • Accessing an array out-of-bounds causes undefined behavior.
    • Undefined behavior means your program has gone off the rails, anything could happen, from something catastrophic to everything seeming to work.
    • You should never write a program that you know has undefined behavior just because nothing bad happened when you tested it.
  • Typical C++ compilers generate code that does things the fastest and easiest way, without any special safety checks.
    • That includes finding the place in memory where we're claiming an item should be and interpreting the bits at that location as being of the appropriate type (that you asked for).
    • Accessing an array out-of-bounds can cause strange behavior. While, in principle, anything can happen when code causes undefined behavior, the strange behaviors you might see can vary between systems. But you can learn to recognize problematic behaviors, especially on the systems you use for coding and testing.
  • Accessing memory that is way out-of-bounds can cause your program to crash.
    • The operating system will not allow your program to access memory outside of its allotted region.
    • This error is called a segmentation fault (segfault for short).

Code Style

  • The goal of our code should always be to explain our thinking to other people who read the code.
  • Similar things should look similar and different things should look different.

Testing

  • When we write tests for our code, our goal should be to reveal bugs, not just demonstrate that our code "works".
  • When you write tests, think about what edge cases you need to consider to make sure that your implementation is correct.
  • If you make a mistake in your own code that results in "wrong" behavior, write a test for it! While we encourage test-first development, you can always add to your tests to make them more thorough as you develop a deeper understanding of where the tricky parts are in a particular coding task.
  • Arrays, Our First Data Structure!
    • Properties of arrays: contiguous, ordered, homogeneous, fixed-size
    • Advantages/disadvantages of arrays
  • Declaring Arrays in C++
    • Declaring arrays in C++
    • Specifying size, initialization
    • constexpr for non-variable sizes
  • Accessing Array Elements
    • Array variable holds address of first element
    • Indirection/dereference operator (*)
    • Indexing with offset ([])
    • Changing array elements
  • Out-of-Bounds Access
    • Undefined behavior from out-of-bounds access
    • Typical unsafe array access behavior
    • Segmentation faults from accessing invalid memory
  • Array Example
    • Object lifetime for arrays
    • Walkthrough of array example code
  • Style and Elegance
    • Code style and consistency
    • Naming conventions
    • Whitespace and indentation
    • Idioms and community expectations
  • Program Correctness Matters
    • Purposes and limitations of testing
    • Real-world bugs and consequences
    • Test goals: revealing bugs, not proving correctness

Week 3

Lesson 1 — Multiple Files, Forward Declarations and Include Guards

Key Points

  • We split our code into multiple files to avoid code duplication.
  • A multifile program contains:
    • One or more source files (.cpp), exactly one of which has a main function.
    • One or more header files (.hpp), which declare functions that are defined in the source files (.cpp).
  • Make sure that every header file (.hpp) has an include guard.

    #ifndef FILENAME_HPP_INCLUDED
    #define FILENAME_HPP_INCLUDED
    
    ...stuff...
    
    #endif
    
    • Include guards prevent multiple definitions when a header file is included more than once in the same source file.
    • Each preprocessor macro we use for an include guard must be unique (otherwise multiple files will be trying to use the same guard and only one of them will be seen), which is why we usually use the filename as part of the name.
  • Make sure that each file uses #include to include any header files (.hpp) that declare things it uses.

    • That often includes a source file's own header file!
    • Sometimes header files need to include other header files!
    • Use #include "filename" for your local files from the current directory.
    • Use #include <filename> for system include files
  • To compile your project:
    • Compile each source file (.cpp) into an object file (.o) using clang++ -c filename.cpp.
      • In practice, you'll probably want other flags too, such as -std=c++17, -g and -Wall -Wextra -pedantic.
      • (Remember from last week that this phase includes preprocessing, compiling the source file into assembly code, and then assembling the assembly code into machine code.)
    • Link object files (.o) together into an executable using clang++ -o executableName file1.o file2.o ...
  • To recompile:
    • If a source file (.cpp) changes, recompile it into an object file (.o) and relink as necessary.
    • If a header file (.hpp) changes, recompile all source files that include it or that include a file that includes it, and so on.
  • Rules of thumb:
    • Don't compile a header file (.hpp).
    • Don't #include a source file (.cpp).

Lesson 2 — References

Key Points

  • A variable with a reference type (e.g., int&) is just another name for an existing variable (in this case an int).
  • An important use of references is as function parameters:
    • Allows the function to alter data outside of its stack frame.
    • Prevents unnecessary copy operations.
  • The const keyword specifies that changes are prohibited (i.e., we have read-only access to the data).
    • The const keyword affects what we are allowed to do via a particular name, not what others might be able to do via a different name, so the same piece of data can have const names that cannot be used to change it and non-const names that can.
    • You cannot initialize a non-const reference (e.g., of type int&) with a const name (e.g., of type const int). That would circumvent the const restriction!
    • A const-reference function parameter is a good way to avoid needless copying but still promise not to change the given argument.

Week 4

Lesson 1 — Classes and Objects

Key Points

Recap Video

There was a fair amount to digest in this lesson. As a review, and because sometimes it's helpful to see things presented a different way, here's a video that goes over what we've covered in this lesson. If you feel like you've already mastered the material, you can skip it.

  • Pig speaking

    MORE videos! I'm excited!!

  • Cat speaking

    I'm pretty sure I've understood, but sometimes it's comforting to watch a video that tells me things I think I know.

Key Points

  • Classes are conceptually similar to what you've seen before, but have syntactic/terminological differences.
  • Instead of “instance variables” and “methods” we say “member variables” or “data members” and “member functions” (“members” to refer to them all together).
  • Declarations/definitions
    • The class definition goes in the header file (.hpp). It declares all of the class members.
      • In CS 70 convention we name all member variables with a trailing underscore (e.g., age_).
    • The member-function definitions go in the source file (.cpp). They specify the instructions for each function.
    • When we define a member function we have to specify which class it belongs to using the scope-resolution operator (::):
      • For example, void Cow::moo(int numMoos);.
  • A member function can be declared const.
    • A const member function promises not to change the values of member variables.
    • If you have a const object, you can only call const member functions on it.
  • We looked at two different kinds of constructors:
    1. Parameterized constructors take parameters and must be invoked explicitly (e.g., Cow bessie{3, 12}).
    2. Default constructors take no parameters and are implicitly invoked for default initialization (e.g., Sheep fluffy;).
  • You can disable the use of a default constructor using the delete keyword (e.g., Cow() = delete;).
  • In a constructor, the member-initialization list specifies how to initialize each member variable.
    • For example, Cow(int numSpots, int age) : numSpots_{numSpots}, age_{age}.
    • In CS 70, you must use a member-initialization list whenever possible. (The only exception is initializing a primitive array when you want a copy of another array—for that case, you need to loop over the elements of the array you're copying to populate your new array.)
  • A class definition ends in a semicolon (;).
    • A class definition ends in a semicolon.
    • A class definition ends in a semicolon.
  • LHS Cow speaking

    Did we mention that a class definition ends in a semicolon?

Lesson 2 — Object Lifecycle in Detail (Constructors, etc.)

Key Points

  • Object lifetime on the stack: when?
    • Allocation: at the opening { of the function
    • Initialization: at the declaring line
    • Use: between initialization and destruction
    • Destruction: at the closing } of the declaring block
    • Deallocation: at the closing } of the function
  • Default initialization/construction
    • int x;
    • Cow mabel;
    • Cow bessie{};
    • For an object, invokes the default (parameterless) constructor
  • Copy initialization/construction
    • Creates a new object that is a copy of an exiting object
    • int x = y;
    • int a{b};
    • Cow mabel = adeline;
    • Cow bessie{fennel};
    • For an object, invokes the default constructor (which takes a reference to an object of the same type)
    • Also used to initialize function parameters and return values
  • Assignment operator
    • Changes an existing object to be a duplicate of another existing object
    • s = t;
    • cadewyn = ethyl;
    • For an object, invokes the assignment operator, a special member function called operator=.
  • Destructor
    • A special member function that is automatically invoked when an object is destroyed
    • For class Cow, the destructor is named ~Cow
    • Typically used to “clean up” or release any resources the object is holding onto
  • The compiler can synthesize or disable these functions:
    • Cow(); // Means that the programmer will define the default constructor
    • Cow() = default; // Means that we will use the compiler's synthesized default constructor
    • Cow() = delete; // Means that this class does not have a default constructor
    • Same for default constructor, copy constructor, assignment operator, and destructor (but don't disable the destructor!!)
  • For arrays, the same as above, only \( N \) times, where \( N \) is the size of the array.
  • For references, initialization and destruction mark the usage period of the name
    • We don't model any allocation or deallocation for references
  • For data members of a class
    • Allocation happens when the object is allocated
    • Initialization happens before the opening { of the constructor (so use the member-initialization list!)
    • Destruction happens at the closing } of the destructor
    • Deallocation happens when the object is deallocated
  • Before You Start…
    • The five phases of object lifetime
  • Objects on the Stack
    • Timing of each phase for stack variables
  • All About Initialization
    • Initialization of primitive vs. class types
    • Default vs. direct vs. copy initialization
    • Copy constructors
    • Function parameters and return values
  • Synthesized Constructors
    • Synthesized default and copy constructors
    • Specifying synthesized, programmer-defined, or deleted constructors
  • Use Phase
    • Assignment operators
    • Synthesized assignment operators
    • Assignment vs. copy initialization
  • Destruction
    • Purpose of destruction phase
    • Destructors in C++
    • Naming convention
    • Purpose is to clean up / release resources
    • Synthesized destructors
  • Arrays
    • Object lifetime for arrays
  • References
    • Object lifetime for references
  • Data Members
    • Object lifetime for data members
  • Putting It All Together
    • Comprehensive tracing of object lifetime in an example program

Week 5

Lesson 1 — Pointers, the indirection operator (*), and reading types

Key Points

Pointers in General

  • Pointers are a kind of value that represents the memory address of a value of a particular type.
    • Example: int* x means that x will hold a pointer to an int. A pointer to an int (a.k.a. "int pointer") is the address of a piece of memory that purports to contain an int.
    • We can access the data a pointer points to by following it using the indirection operator: *x is another name for the data that x points to.
      • Note: Sometimes people with a C background say "dereference" instead of "follow" (and call * "the dereference operator"). That term is a bit less clear, especially given that it has nothing to do with C++ references. We're trying hard to stick to the C++ terminology in this class (although we might slip up occasionally!).
    • We can read types involving pointers (and more) using the inside-out rule.
  • Pointers are primitive types in C++.
    • There are infinitely many pointer types, because for every (non-reference) type, we can add a * to the end of the type to create a pointer for that type.
    • Pointers support const for setting read-only status. We can use const on the pointer itself, the value the pointer points to, or both.
    • If we let a pointer be default initialized, it will point to an indeterminate value. (From a "did you follow the rules" perspective, a default-initialized pointer is invalid and inappropriate to follow; doing so leads to undefined behavior.)
  • Pointers are useful!
    • We can pass arrays into functions using pointers, with either
      • A first-element pointer and a size; or
      • A first-element pointer and a past-the-end pointer (which is C++'s preferred way).
    • We can also use pointers to avoid moving or copying data, by rearranging the pointers-to-data rather than rearranging the data itself.
  • Every piece of data has a memory address.
    • We can get the address of a variable using the address-of operator: &.
    • Example: if we have int x, then &x is the address of x.
    • (Not to be confused with & as part of a type: int& r is a reference to an int, not the address of the int.)
    • We almost never need to use the address-of operator.

Pointers and Instances of Classes

  • In a member function, this is a pointer to the object that the member function was invoked on.
    • We almost never need to use this; when we do, it's usually as *this.
  • If we have a pointer to an instance of a class, there are two ways to access the members of the object:
    • Follow the pointer, then access: (*myObj).myMemberFunction();
    • Follow and access together (preferred): myObj->myMemberFunction();
  • Pig speaking

    I bet there's even MORE to pointers!

  • LHS Cow speaking

    There is! And that's the next lesson.

Lesson 2 — Dynamic allocation (new and delete) and memory-management issues

Key Points

  • The heap is a region of memory for things that we explicitly allocate and deallocate when we want to.
    • The new keyword allocates space, initializes the data, and returns the address of the new data.
      • Allocate one thing: int* p = new int{42};.
      • Allocate an array of things: int* arr = new int[42]; // arr is a pointer to the first element.
    • The delete keyword takes the address of something on the heap, destroys the data, and deallocates that chunk of memory.
      • Delete one thing: delete p;.
      • Delete an array of things: delete[] arr;.
      • All variants of delete can only be given pointers to heap memory that came from new. You can't delete anything else (other memory, parts of things, etc.).
    • Every call to new should have a matching call to delete later on.
      • Non-arrays allocated with new should have be deallocated with delete
      • Arrays allocated with new ... [] should be deallocated with delete[] ....
      • Don't call the wrong one!
  • Types of allocation:
    • Static allocation: memory that is allocated for the entire duration of the program (e.g. a static member variable).
    • Automatic allocation: memory allocated on the stack (because it is allocated/deallocated automatically when functions begin/end).
    • Dynamic allocation: memory allocated on the heap (because the size of the allocation is "dynamically" determined at runtime as opposed to compile time).
  • A memory leak is when something is allocated on the heap and becomes inaccessible (all pointers/references to it are destroyed).
    • In our memory model, leaks look like data on the heap with no valid name.
    • Once memory has been leaked, the program can't deallocate it!
    • Prevent memory leaks:
      • For every new that occurs during the program, there must also be exactly one delete.
      • When you use new, consider also writing the corresponding delete statement wherever it will need to be.
  • Other common memory-management errors:
    • Double delete: Trying to delete the same data twice.
      • Most commonly caused by using delete on two pointers that both point to the same data.
    • Dangling pointer: Following a pointer to memory that has been deallocated.
      • Most commonly caused by continuing to use a pointer after the thing it points to has been deleted.

Week 6

Lesson 1 — Lifetime on Heap; Destructors, Copy Constructors and Assignment Operators

Key Points

  • Sometimes a class needs to manage memory on the heap.
    • We allocate that memory using new, either in its constructor(s) or in other member functions. The new operation
      • Allocates memory on the heap (requesting it from the system's heap manager).
      • Constructs the object (or objects) in that space.
      • Returns a pointer to the memory. We'll need to eventually give this pointer to the appropriate delete.
    • Our classes own their resources, therefore the class also owns the memory it allocates on the heap—in other words, it's responsible for that memory.
      • In our classes, ownership is exclusive. If two objects both think they own the same data, it's a bug.
  • A class's destructor must release all memory in the heap that the object is responsible for.
    • The synthesized destructor will simply destroy each data member of an instance of a class.
    • If a data member is a pointer that is storing the location of memory on the heap, that memory will not be destroyed and deallocated when the pointer is destroyed.
    • Failing to deallocate heap memory that our object is responsible for can cause memory leaks, where an object is gone, but the dynamically allocated data that it owned is still occupying memory but is inaccessible.
    • We release heap memory via a delete statement, which
      • Takes a pointer (that came from new).
      • Destroys the object at that memory address.
      • Deallocates the memory (giving it back to the system's heap manager).
  • A class's copy constructor should be sure to create a deep copy by allocating new heap memory for a new instance of a class.
    • The synthesized copy constructor will simply copy initialize or construct each data member (shallow copy).
    • If one of the data members is a pointer to heap memory, then two objects will both point to the same heap memory!
    • This situation can cause all kinds of trouble, including double-delete errors, if they both objects try to delete that memory.
  • A class's assignment operator should be sure to clean up "old" memory in an instance of a class, then allocate new heap memory for the new configuration of the class instance.
    • The copy-and-swap idiom is a clever/elegant way to do this.
    • A less elegant but more direct way is to first use code from the destructor, then use code from the copy constructor.
  • Alternatively, if we don't think a copy constructor or assignment operator is needed, we can instead choose not to provide them. (Or, during development we can disable them, and then write them last once everything else is done.)

Week 7

Lesson 1 — More Memory Diagrams, References, and Pointers; lvalues vs. rvalues (temporaries); How functions return values; Returning references; Overloading.

Key Points

  • There are lvalues and rvalues:
    • In x = y + z, x is an lvalue and y + z is an rvalue.
    • lvalues have an obvious memory location and significant lifetime; they often have a name.
    • rvalues are temporary values that exist for the duration of one line of code; they often don't have names.
  • Every function call that returns a value is actually initializing a new object in memory
  • The compiler allocates temporary objects (rvalues) as needed (e.g., to hold intermediate results in nested expressions).
  • Functions can return references.
  • We can overload functions (i.e., define multiple functions with the same name that operate on different sets of arguments).
    • C++ uses the types of the arguments of the function to figure out which function we mean to call. If selecting the best function for a particular set of arguments is ambiguous, it's an compile-time error.
    • We can implement our own versions of C++'s existing operators for our own types, so we can make our types feel “built in”.
  • Before You Start
    • Review of constructing objects in C++, including default construction, copy construction, and temporary objects
    • Review of allocating objects on the stack vs. the heap in C++
    • Review of iterating through arrays in C++ using index variables and pointer arithmetic
  • Passing Copies
    • Drawing memory diagrams for C++ code
    • Pass-by-value parameter passing in C++ functions
    • Comparing hand execution to actual program output
  • Passing References
    • Pass-by-reference parameter passing in C++ functions
    • Using references to avoid copying data
  • Discovering Temporaries
    • Temporary objects in C++ used to hold intermediate results
    • Lvalues and rvalues in C++
    • Taking addresses of temporary objects
  • Returning Values
    • How functions return values in C++ by initializing space allocated by the caller
    • Drawing memory diagrams for function calls that return values
  • Returning References
    • Returning references from C++ functions
    • Modifying existing variables by returning a reference to them
  • When to Return References
    • Guidelines for when to return references in C++
    • Dangers of returning references to local variables
  • Overloading
    • Function overloading in C++
    • Overloading operators like << in C++
    • How C++ resolves overloaded functions
  • Putting It All Together
    • Example of a CheckList class in C++
    • Implementing array-like behavior with operator[] overloading

Lesson 2 — Iterators, auto, std::vector, and std::list.

Key Points

  • RHS Cow speaking

    Here's a summary of what we've learned in this lesson.

Iterators

  • An iterator is an object that represents a “location” in a collection (a.k.a. container).

    • It can get/set the item at that location, and move forward to the next location.
    • Provides a unified interface for iteration in lots of data structures.
    • Allows a data structure to define the best way to iterate through its items.
  • Other languages also have iterators, so they are a general concept; not C++ specific.

  • C++'s iterators are inspired by the begin/past-the-end scheme for iterating over an array.

  • In the C++ standard library, a container provides the following functions:

    • begin—returns an iterator to the first item, and
    • end—returns an iterator "one past" the last item.
      • We use the end iterator to know when we've gone too far in a loop!
    • It must be the case that…
      • If i is an iterator to the last item in c, then ++i == c.end(), and
      • A container c is empty if and only if c.begin() == c.end().
  • In the C++ standard library, an iterator offers the following operations:

  • *iter — returns a reference to the item at the iterator's location.

    • ++iter — advances the iterator to the next location.
    • iter != otherIter — returns true if the two iterators are not at the same location.
  • "Iterator" is not another word for "pointer".

    • Both iterators and pointers abstractly represent a concept of “location” and allow access to the data at a location.
    • A pointer is the iterator type for a primitive array.
    • An iterator might have a pointer as a member to help it do its job.
    • But an iterator object has data and member functions, whereas a pointer is just a primitive type representing a memory address.
  • Common idioms for looping through a collection using an iterator:

    for (std::vector<int>::iterator iter = vec.begin(); iter != vec.end(); ++iter) {
        std::cout << *iter << std::endl;
    }
    

    or

    for (auto iter = vec.begin(); iter != vec.end(); ++iter) {
        std::cout << *iter << std::endl;
    }
    

    or

    for (int x : vec) {
        std::cout << x << std::endl;
    }
    
  • Read-only iterators are a distinct type where operator* returns a const reference rather than a non-const reference. They're usually called const_iterator rather than iterator.

  • Cat speaking

    I get the general idea, and I feel okay with using iterators, but I don't feel like I have any practice actually making new iterator classes.

  • LHS Cow speaking

    That's right.

  • RHS Cow speaking

    Don't worry, practice with building our own iterators is coming.

C++'s Sequence Containers

We also learned about std::vector and std::list in C++'s standard library. They are both examples of sequence containers (a.k.a. lists), and they're both generic in that they can hold a type of our choosing.

  • std::list<T> is a doubly linked list with list items of type T.
  • std::vector<T> is an array-backed list with array elements items of type T and automatic resizing.

Both are full-featured list objects, with useful member functions well-suited to their underlying representations (a.k.a. encodings).

Lesson 3 — Asymptotic Complexity and Best-, Worst-, and Expected-Case Analyses.

Key Points

Empirical timing

  • Tells you how long a program actually took, which is great.
  • But the findings are specific to the experimental conditions (compiler options, computer, inputs tested, etc.).
  • So we have to be careful how we extrapolate the findings to other conditions.

Operation counting

  • Give the number of times a particular operation (or set of operations) occur as a function of some aspect of the input.
  • Abstracts away all system-level issues (compiler, CPU, etc.)
  • Allows us to weigh trade-offs in constant factors and lower-order terms.
  • Requires us to make choices about what to count and what to account for in the input.
  • Can be unnecessarily tedious/detailed for some questions.

Asymptotic analysis

  • Answers the question “How well does this algorithm scale?” (e.g., what happens if we double the input size)
    • Technically, is focused on how the cost grows as the size of the input goes to infinity.
      • As a result we ignore constant factors and lower-order terms.
      • In practice conclusions about how costs grow are usually applicable to merely “large” sizes, not just infinite ones.
  • Must useful when costs are fundamentally different (e.g., \(\Theta(n\) vs \(\Theta(n^2\)) rather than when they appear the same.
  • \(f(n) \in \Theta(g(n))\) means that \(f\) and \(g\) are asymptotically similar.
    • \(f(n) \in \Theta(g(n))\) if and only if 0 < \(\lim_{n\rightarrow \infty} \frac{f(n)}{g(n)} < \infty\)
  • Usually we compare a cost function to simple reference functions, sometimes called orders of complexity:
    • \(\Theta(1)\) — constant time
    • \(\Theta(\log n)\) — logarithmic time
    • \(\Theta(n)\) — linear time
    • \(\Theta(n \log n)\) — linearithmic time
    • \(\Theta(n^2)\) — quadratic time
    • etc.
  • \(f(n) \in \textrm{O}(g(n))\) means that \(f\) is "no worse than" \(g\) in an asymptotic sense.
    • \(f(n) \in \textrm{O}(g(n))\) if and only if \(\lim_{n\rightarrow \infty} \frac{f(n)}{g(n)} < \infty\)
    • Example usage:
      • "In the worst case the complexity is \(\Theta(n)\) and in the best case it is \(\Theta(1)\), so the complexity is \(O(n)\)."
  • The asymptotic notation family:
    • \(f(n) \in \textrm{o}(g(n))\), kind of like \(x < y\).
    • \(f(n) \in \textrm{O}(g(n))\), kind of like \(x \le y\).
    • \(f(n) \in \Theta(g(n))\), kind of like \(x = y\).
    • \(f(n) \in \Omega(g(n))\), kind of like \(x \ge y\).
    • \(f(n) \in \omega(g(n))\), kind of like \(x > y\).

Best case analysis

  • What is the smallest number of operations for an input of size \(n\)?

Worst case analysis

  • What is the largest number of operations for an input of size \(n\)?

Expected case analysis

  • What is a "typical" number of operations?
  • Three steps:
    • Break the outcomes into cases so that each case has the same complexity
    • Assign probabilities to the cases
      • Usually we assume a probability model that we think will help us understand the algorithm in practice.
      • A very common probability model is the "uniform probability" model which assumes every case is equally likely.
    • Find the expected complexity
      • A weighted sum of the complexities of the cases, weighted by probability.
      • \(\sum_{k} Pr(\textrm{Case}\ k)\textit{Cost}(\textrm{Case}\ k)\)

Week 8

Lesson 1 — Interfaces, Encodings, and Implementation; Iterator Validity

Key Points

Levels of a Data-Structure Specification

  • Interface: the operations it offers
  • In C++, the public members of a class
    • Encoding: the data the structure uses to do its job
      • In C++, the private members of a class
    • Implementation: the actual procedures that manipulate the data to satisfy the interface promises
      • In C++, the member-variable definitions of a class

Multiple Encodings for the List Interface

  • Array-backed lists
    • Contents stored in an array, which resizes as necessary
    • Good at random access
    • Bad at inserting/removing at or near the start of the list
  • Linked lists
    • Contents in list nodes, linked in a chain
    • Good at inserting/removing at beginning
    • Bad at random access

Iterators

Iterator Misuse: Obey These Rules

  • Don't use * on a past-the-end iterator.
  • Don't use ++ on a past-the-end iterator.
  • Don't use -- on an iterator at the beginning (there is no such thing as a before-the-beginning iterator).
  • Don't do anything to an iterator that isn't valid other than destroy it or assign a new value to it.
  • Don't compare iterators from different origins.

Valid and Invalid Iterators

The interface for a data structure gives rules for when iterators will stay valid and when they will cease to be valid.

  • Universally applicable reasons an iterator would be valid but not refer to an item (we can't call ++ or * on these iterators, but we can test them against other iterators from the same origin):

    • The iterator was default constructed, rather than initialized with a valid value.
    • The iterator is past the end.
  • Universally applicable reasons an iterator would be not be valid:

    • The underlying container is destroyed/deallocated.
  • Container-specific iterator validity rules

    • For containers with an erase member function, eraseing the item referred to by an iterator means that the iterator passed to erase is no longer valid, and also that any and all other iterators referring to the erased location are no longer valid.
    • Sequence data structures based on arrays typically don't promise that iterators will continue to be valid after an item is added or removed (so the data structure can resize the array, which moves all of its data to somewhere else in memory when it copies the array).
    • In contrast, sequence data structures based on linked lists typically do preserve iterators as much as possible, except for iterators that refer to items that have been removed.

Week 9

Lesson 1 — Introduction to Basic Binary Search Trees

Key Points

  • A binary search tree is
    • A binary tree such that…
    • The key in every node is bigger than all keys in its left subtree and smaller than all keys in its right subtree.
  • A BST is designed for search:
    • Applications where we need to efficiency store and find data.
    • Associative containers that associate keys with values (like Python dictionaries).
  • BST lookup:
    • Compare the key to the current node.
    • If the same, we're done! If smaller, go left. If larger, go right.
  • BST insert:
    • Use a similar lookup process to find a spot at the bottom of the tree for the new key.
    • Connect the new node to the bottom of the tree.
  • BST delete:
    • Use lookup to find the node with the key.
    • If no children: just remove it!
    • If one child: move the child up to the node's place.
    • If two children: replace the node with the largest key smaller than the one being removed (the rightmost descendant of the left child).
  • Complexity of Trees (Balanced and Unbalanced):
    • Insert and delete complexities depend on lookup.
    • Lookup complexity is \( \mathrm{O}(h) \), where \( h \) is the height of the tree.
    • Height can vary depending on insertion order!
    • Stick: \( h \in \Theta(n) \).
    • Well-balanced: \( h \in \Theta( \log{n} ) \).
    • So complexity of all these operations:
      • Stick: \( \mathrm{O}(n) \).
      • Balanced: \( \mathrm{O}( \log{n} ) \).
      • Overall (if there's no guarantee of balance): \( \mathrm{O}(n) \).
  • Recursive helper functions:
    • Member functions usually operate on a specific instance of a class (e.g., one whole tree).
      • The interface a member function provides to the user probably doesn't leave it well suited for that function itself to be recursive.
        • For example, there is no smaller IntTree instance to recurse on (internally, our tree is made of Nodes).
    • Recursive functions necessarily have the data they're working on as a parameter, since each successive call needs to pass in a smaller variant of the problem to solve, until we hit the base case.
      • The parameters a recursive function needs don't necessarily provide the right interface for the user.
    • We can combine the two and address these issues by having the member function call a recursive helper function to operate on the internal data.
      • If the helper function doesn't need to use any data members from the containing class instance, declare it static.
    • When you write insert and exists yourself, we expect you to use a recursive approach (using recursive helper functions).
  • In-order traversal:
    • Visits the nodes in the tree in sorted order
    • Recursive algorithm: do-the-left-subtree, do-the-value, do-the-right-subtree.
  • Static member functions:
    • Don't operate on an instance of the class (i.e., no this object).
  • Search
    • Different data structure interfaces: maps vs sets
    • Lookup and insert as key operations for these data structures
  • Tree Terminology
    • Terminology for tree data structures:
      • Node, edge, parent, child
      • Root, leaf
      • Height
      • Binary tree
  • BST Lookup
    • Binary search trees
      • Lookup procedure
      • Applications of binary search trees
  • BST Insert
    • Insertion into a binary search tree
    • Recursive insertion algorithm
  • BST Delete
    • Deletion from a binary search tree
  • Time Complexity
    • Time complexity analysis for binary search trees
      • Best case vs worst case
      • Overall complexity
  • Encoding a BST in C++
    • Encoding a binary search tree in C++
      • Node struct
      • Overall tree class
  • C++ Implementation: insert()
    • Implementing insert in C++
      • Helper functions
      • Passing pointers by reference
  • Static Member Functions
    • Static member functions in C++
  • Insert Alternatives
    • Alternative implementations of insert
  • exists
    • Implementing exists()
  • Iterating Over Our BST
    • In-order traversal of a binary search tree

Lesson 2 — 2–3–4 Trees and Red–Black Trees

Key Points

2–3–4 Trees

  • A 2–3–4 Tree allows three kinds of nodes:
    • 2-nodes hold one value and have two children.
    • 3-nodes hold two values and have three children.
    • 4-nodes hold three values and have four children.
  • Insertion in a 2–3–4 tree always happens at a leaf, by adding the new value to an existing leaf node. That insertion will turn a 2-node into a 3-node or a 3-node into a 4-node.
  • To make sure that there is always room to insert new values, any time we pass a 4-node on the way down the tree during an insertion, we promote the middle value up into the node above it.
  • 2–3–4 trees require more work at each node than a BST, but because they stay perfectly balanced (all leaves the same distance from the root), they have worst-case lookup and insert performance of \( \Theta( \log n ) \).

Red–Black Trees

  • A Red–Black tree is a binary representation of a 2–3–4 tree.
  • To convert a 4-node to a Red–Black representation, create a black node from the middle value. Then add two children for the left and right values, both as red nodes.
  • To convert a 3-node to a Red–Black representation, create a black node from one of the two values. Then add a child for the other value as a red node.
  • Because Red–Black trees are isomorphic with 2–3–4 trees, they also have \( \Theta( \log n ) \) worst-case performance for insert and lookup.

Rotation

  • A fundamental operation for rearranging nodes in a BST.
  • Can be implemented with a constant number of pointer reassignments (i.e., constant time):
    • A child of the root of the subtree moves up to become the new root.
    • The old root becomes the new root's child.
    • The subtree “between” the two nodes becomes the old root's child.
  • Rotations preserve the BST property.

Like these University of Washington CSE grad students (“The CSE Band”) from the early 2000s, we hope you're inspired by the beauty of Red–Black trees!.

Week 10

Lesson 1 — Amortized Analysis and Splay Trees

Key Points

  • Expected-Time Reminder:

    • Expected time describes the performance of an algorithm with some amount of randomness. Time for individual operations will come from a probability distribution. In practice, probabilities may be a sufficient guarantee, but from a theoretical perspective worst-case behavior is possible even if it is unlikely.
  • An amortized bound promises that the total complexity of a sequence of operations is bounded by the sum of the amortized times of the individual operations.

    • Only applies to the total complexity of the sequence of operations from the very beginning.
    • It does not give a bound for a the time complexity of a single operation, time for individual operations can vary significantly.
    • A useful fiction: even though some operations are expensive and some are cheap, an amortized bound spreads the cost evenly.
    • Everyday analogy: it's like a professor keeping you an extra ten minutes past when class should have ended because the previous two classes ended five minutes early
  • Aggregate method:

    • Consider an arbitrary sequence of operations.
    • Analyze its complexity.
    • Divide by the number of operations to get an amortized bound.
  • Array-backed list analysis:

    • With capacity doubling, amortized \( \Theta(1) \) push_back.
    • With capacity halving, amortized \( \Theta(1) \) pop_back.
  • Splay Trees:

    • Self-balancing tree with amortized \( \mathrm{O}( \log{n} ) \) insert/lookup, \( \mathrm{O}(n) \) worst-case.
    • Really efficient when lookups have high locality (lookups tend to repeat).
    • Every inserted/looked-up node is rotated to the root.
      • Recently inserted/found nodes are easy to find (near best-case time).
      • No space overhead!
    • Inspiration for this song by Dani Demas, HMC '15! (Warning: banjo.)
  • Before You Start
    • Characterizing best case, worst case, and overall complexity of push_back() in an array-backed list
    • \( \Theta \) vs. \( \mathrm{O} \) notation for characterizing complexity over a range of cases
  • Splay Trees
    • Overview of splay trees, including:
      • Splay operation to bring recently accessed nodes to root
      • Gains efficiency from temporal locality
      • Visualization website
  • Splay-Tree Complexity 1
    • Observing extremes of efficiency during inserts into splay trees
    • Factoring long-term balance into complexity arguments
    • Best, worst, and overall case analysis
  • Variable Costs
    • Complexity analysis of in-order traversal
      • \( \Theta(n) \) overall but \( \Theta(1) \) to \( \Theta(n) \) per step
  • Characterizing Operation Costs
    • Limitations to worst case analysis
    • Using "banker's method" analogy and operation coloring to argue complexity over sequences of operations
  • IntVector push_back()
    • Argument showing amortized \( \Theta(1) \) per push_back() when doubling capacity
    • Discussion on rigorously deriving \( 2m - 3 \) moves formula
  • Proving Amortized Bounds
    • Three approaches to proving amortized bounds
      • Aggregate method: look at total work for m operations, divide by m
      • Banker's method: treat time like money
      • Physicist's method: time like energy, with potential function
  • Amortized Time vs. …
    • Contrasting amortized, worst-case, and expected complexity
    • Amortized analysis lets you pretend an operation met its amortized bound, doesn't bound a single operation
    • Applicability depends on what user cares about
  • Splay-Tree Complexity 2
    • Applying amortized analysis to argue total complexity
    • Comparing splay trees to other BST choices on dimensions like overhead and locality

Lesson 2 — Randomized Binary Search Trees

Key Points

  • We can use rotations to change the structure of a BST without changing the values that it holds.
    • A left rotation moves the root's right child up to be the new root.
    • A right rotation moves the root's left child up to be the new root.
    • Since rotations all involve a fixed number of pointer operations, they can be done in constant time.
    • This diagram shows how the nodes and subtrees get rearranged in a left and right rotation:

  • Insert and move to root

    • We can use rotations to provide a way to insert new values at the root of a tree.
    • Insert at root actually puts the new value at a leaf, then “bubbles” it up to the root of the tree through a series of rotations.
    • Through the magic of recursion, we only have to keep track of one rotation at a time instead of remembering the long sequence of rotations needed to move a value from an arbitrary leaf up to the root of the tree.
  • A randomized tree uses a random-number generator to decide whether to insert at the root of a subtree or to recurse further down into the left or right branch of the tree.

    • Randomized insertion might insert a new node at any layer of the tree.
    • For each subtree on the way from the root to a leaf, if \( N \) is the size of the subtree, then the node is inserted at the subtree root with probability \( \frac{1}{ N + 1 } \).
    • Randomized trees have expected height \( \Theta(\log n) \), no matter the order of key insertions!
  • Random-number generation is a pain in C++.

    • Use the CS 70–provided library (RandUInt32) to make random-number generation easier!
    • In some circumstances (e.g., debugging, testing), being able to seed a random number generator can be a nice way to get predictable output from a random number generator. The RandUInt32 class in the CS 70–provided library can be seeded with an integer.

Week 11

Lesson 1 — Templates

Key Points

Templates

  • A template is a recipe for generating code that works with different types.

    • A template has template parameters that can be filled in with various types.
    • Declare a template with, for example, template <typename T> (where T is the parameter)
    • You can have function templates and class templates.
  • A template cannot be compiled on its own.

    • The actual instructions would differ for different types.
    • So a template must be compiled along with code that uses it (provides specific template arguments).
  • Typical file structure for templates.

    • A source file that uses the template includes
      • my-templated-thing.hpp, which
        • Contains traditional “header-file” content (declarations, etc.)
        • And—at the bottom of the file—#includes
          • my-templated-thing-private.hpp, which
            • Contains traditional “source-file” content (definitions, etc.)
      • There is no my-templated-thing.cpp file.

Type Conversion

  • When choosing which function to call for overloading, for each argument an exact match on the type is best, then a template-based argument, then type conversion.
  • Type conversion between classes is possible, but only if one of the classes specifies how. There are two ways to do it:
    • The “target” class can define conversion by creating a single-parameter constructor and/or creating a specialized version of the assignment operator.
    • The “source” class can define conversion by creating a type-casting operator.
  • If you don't want a single-parameter constructor or type-casting operator to be used for implicit conversion, you should use the explicit keyword.
  • There are limits to how much conversion the compiler will do:
    • It will only perform conversion once; it won't do a “chain” of conversions.
    • It will only perform conversion for parameters to a function; you can't apply conversion to a value that you're calling a member function on.
  • Code for Multiple Types
    • Writing code for multiple types by copying and pasting
    • Filling in a for loop for list<double>::iterator
  • Function Templates
    • Function templates as recipes for generating code
    • Template parameters like T holding types
    • Calling a function template with explicit template arguments
    • Type deduction allowing calling without explicit arguments
    • Taking parameters as const references in templates
    • Multiple template parameters
  • Templates and Compilation
    • Templates can't be compiled on their own without concrete types
    • Templates must be compiled along with code that uses/instantiates them
    • Typical organization of templated code into .hpp, .cpp, -private.hpp files
  • Class Templates
    • Class templates like Pair for storing pairs of any types
    • Every member function needing template preamble like template args
    • Instantiating class templates requires template arguments
  • Example: IntVector to Vector<T>
    • Adding template and T throughout IntVector converted to Vector
    • Need for typename before nested types like Vector<T>::iterator
  • Templates vs. Overloading
    • Templates vs overloading: exact match preferred over templates preferred over conversion
    • Calls use conversion from C strings to std::strings
  • Classes Can Provide Type Conversion
    • Single-arg constructors provide auto conversion to a class type
    • Type-cast operator (e.g., operator bool()) enables conversion from a class
    • explicit keyword prevents unintended conversion
    • Limits on chains of conversions

Week 12

Lesson 1 — Hashing and Hash Tables (Separate Chaining)

Key Points

  • A hash table is an array-based structure that stores keys in buckets.
    • A hash function takes a key and returns a number (the hash value).
    • The hash value is turned into a bucket index to where the key should go.
  • A collision is when two keys would be stored in the same bucket.
    • A perfect hash function prevents all collisions. In general, would need all keys in advance to create.
    • Two main collision resolution strategies:
      • Separate chaining: allow multiple keys in one bucket (see below).
      • Open addressing: allow keys to go to some other bucket (next lesson).
  • Separate chaining:
    • Each bucket has another container to hold keys (typically a linked list, hence "chain").
    • Lookup involves using the hash function to get the bucket, then searching the chain.
    • Insert involves a lookup (to ensure that the key is not already present), then insert into the chain.
  • Desirable hash function properties:
    • Deterministic: always gives the same output for the same input (otherwise we can't find keys again!)
      • It's okay to have different hash functions for different program runs (or even different hash tables) to reduce the chance someone could guess what our hash function will do. You can be deterministic while still being hard to predict.
    • Uniform: spreads keys roughly uniformly (patterns in input do not create patterns in output).
      • Avalanche effect: small differences in items cause large differences in keys
    • Efficient to compute: every operation involves the hash function!
  • Load factor:
    • \( \frac{N}{M} \), where \( N \) is the number of keys and \( M \) is the number of buckets.
    • Measures how full the table is.
    • An important quantity in the complexity analysis.
  • Complexity (separate chaining, dynamic size):
    • Lookup: expected \( \Theta(1) \); worst-case \( \Theta(n) \).
      • Expected comparisons for unsuccessful search: \( \lambda \).
      • Expected comparisons for successful search: \( 1 + \frac{\lambda}{2} + \frac{1}{2M} \approx 1 + \frac{\lambda}{2} \).
    • Insert: amortized expected \( \Theta(1) \); worst-case \( \Theta(n) \).
    • Note: expected times assume
      • An ideal hash function (keys are equally likely to be in any bucket).
      • Uniform random lookups (all keys are equally likely to be looked up).
      • Dynamically resized table (table capacity is doubled when \( \lambda \) reaches a constant threshold).
  • Hash tables do not guarantee constant time lookup!
    • They are amortized expected constant time!
      • If our assumptions hold, we can expect our expected performance (and issues such as everything landing in just a few buckets are vanishingly unlikely).
      • Even if our assumptions hold, inserts that cause the hash table to resize will take much longer than inserts that don't.
  • Fixing Lookup
    • Hash functions for mapping keys to array indexes
    • Terminology: hash function, hash table, bucket, hash value
    • Using modulo operator to map hash values to array indexes
  • Collisions
    • Collisions when two keys hash to the same bucket
    • Collision resolution strategies: separate chaining, open addressing
  • Separate Chaining
    • Separate chaining strategy
    • Allowing multiple keys per bucket using a linked list
    • Insert and lookup operations with separate chaining
  • Expected Complexity of Separate Chaining
    • Analysis of expected number of comparisons for separate chaining
    • Unsuccessful lookup expected comparisons
    • Successful lookup expected comparisons
    • Load factor
  • Dynamic Table Size
    • Dynamically resizing hash table to keep load factor low
    • Achieving expected constant time operations
    • Clarifying complexity guarantees: expected vs worst case
  • Interpreting Hash Table Complexities
    • Complexity guarantees and analysis for hash tables
    • Exercises analyzing complexity of hash table operations
  • Hash Function Properties
    • Desirable properties of hash functions
    • Example hash functions and analysis
    • Cryptographic hashes

Lesson 2 — Hash Tables (Open Addressing)

Key Points

  • Open Addressing: In case of collision, put the key somewhere else.
    • Look for a new spot with a sequence of "probes" until an empty bucket is found.
    • For all methods we studied, same asymptotic complexity as separate chaining.
      • Expected \( \Theta(1) \) lookup.
      • Amortized expected \( \Theta(1) \) insert.
    • Different methods have larger or smaller constants for a value of \( \lambda \).
      • Larger \( \lambda \): more comparisons per operation, but less unused space.
  • Linear probing: Probe \(H, (H + 1), (H + 2), (H + 3), \ldots \).
    • Nice and simple.
    • Main concern: Primary clustering (lots of keys next to each other in the table), which
      • Causes long lookup times (many comparisons needed to find an empty bucket), and
      • Any key that hashes into the cluster will make the cluster bigger.
    • Worst overall expected comparisons.
      • Needs low \( \lambda \) to have good performance.
    • Despite its bad-looking numbers, modern CPU architectures mean it may still be very competitive with other schemes.
  • Quadratic probing: Probe \(H, (H + 1), (H + 4), (H + 9), (H + 16), \ldots \).
    • Prevents primary clustering.
    • Problem: Need to be careful about table size.
      • Probes can get stuck in an infinite loop that doesn't visit the whole table.
      • Solution: prime table size and \( \lambda < 0.5 \) guarantees an empty bucket will be visited, but
      • \( \lambda < 0.5 \) means lots of empty buckets (unused space).
    • Problem: Secondary clustering.
      • Keys hashed to the same bucket cluster along the probing sequence.
      • (Not as bad as primary clustering but still an issue.)
  • Double hashing: Probe \( H, (H + I ), (H + 2I ), (H + 3I), \ldots \).
    • Prevents both primary and secondary clustering.
      • Every key has its own probe sequence.
    • Best overall expected comparisons under ideal conditions.
      • Can be efficient with higher values of \( \lambda \).
    • Problem: Need to be careful that table size and increment are relatively prime.
    • Problem: Needs hashing to make both a bucket number and an increment.
      • Both values need to spread keys out and be statistically independent of one another.
      • But both can be generated by a single hash function.
  • Linear Probing
    • Open addressing overview
    • Linear probing, with example
    • Primary clustering
    • Worst case lookup complexity
  • Quadratic Probing
    • Quadratic probing overview, with example
    • Issue with table size
    • Secondary clustering
  • Double Hashing
    • Double hashing overview
    • Generating hash and increment value
    • Constraints on table size and increment
  • Fewest Probes
    • Comparing probing strategies: number of probes
    • Adjusting for separate chaining memory overhead
    • Comparing probing strategies: successful vs unsuccessful lookup
  • Space Efficiency
    • Comparing space efficiency of separate chaining and open addressing
    • Comparing to space efficiency of trees
  • Comparisons: Pros and Cons
    • Pros and cons summary of all strategies

Week 13

Lesson 1 — Priority Queues and Heaps

Key Points

  • Priority queue
    • Stores items with keys that can be compared using < (most commonly numbers).
    • Regardless of order of insertion, gives easy access to “most-important” item.
    • “Most important” can be largest or smallest key.
  • Binary heap
    • A binary tree such that
      • The tree is complete (every level is full except the bottom, which is filled from left to right);
      • The key in any node is the largest/smallest in its subtree.
    • Usually encoded as an array rather than as a tree.
      • The array contains the items in level order.
    • Complexities:
      • GET-MIN: \( \Theta(1) \)
      • POP-MIN: \( \mathrm{O}( \log{n} ) \)
      • INSERT: \( \mathrm{O}( \log{n} ) \)
      • HEAPIFY: \( \Theta(n) \)
        • (Turn an array in arbitrary order into a heap)
  • HEAPSORT
    • A sorting algorithm that uses heap operations!
      • Essentially, MAX-HEAPIFY then POP-MAX every item.
    • Sorts an array of arbitrary numbers in \( \mathrm{O}( n \log n ) \) time.
  • Priority Queues
    • Priority queues
    • Interface: insert, get-min, pop-min
    • Use cases: shortest path algorithms, data compression, load balancing, sorting algorithms
  • Binary Heaps
    • Binary heaps
    • Structure property: complete tree
    • Order property: min heap vs max heap
    • Representing a heap as an array
  • Binary Heap Insert
    • Inserting into a binary heap
    • Maintaining structure and order properties
    • Analysis: O(log n) time
  • Pop-Min
    • Pop-min operation on binary heap
    • Algorithm: replace root, sink down until order property holds
  • Heapify
    • Heapify operation
    • Turning array into heap: front-to-back vs back-to-front
    • Analysis: Θ(n) time
  • HeapSort
    • Heap sort algorithm
    • Uses heap operations to sort array
    • Analysis: O(n log n) time

(When logged in, completion status appears here.)