Lesson Pages Summary
If you want to know which lesson covered a given topic, this page may help you find it.
For each lesson of each week, we've reproduced the final “Key Points” page from that lesson, and then provided links back to the original lesson pages for specific topics.
Week 1
Lesson 1 — Welcome, Variables, and Memory Diagrams
Key Points
Class Format
- Each week you'll have lesson material to go through at your own pace.
- In class, you'll work on homework and check in with the professors.
- During each class meeting, fill out a check in form to get credit for the day.
- You may attend class on Zoom if it's best if you don't attend in person (e.g., if you have symptoms of illness).
Homework
- Except for the first one, you'll have a partner for all homework assignments.
- Each partnership will have one 24-hour extension to apply to a homework deadline.
- We want you to support and learn from your classmates: be careful to preserve each other's learning processes.
Proficiency Checks
- Each week there are four proficiency checks, each focused on a single learning objective.
- Take proficiency checks in exam-like conditions.
- There will be opportunities to retry proficiency checks where you did not demonstrate proficiency the first time.
- During finals week there will be a final try at all remaining proficiency checks (with the possibility of partial credit).
Getting the Most Out of CS 70
- We are here to help! Please reach out to the course staff—and to each other—early and often!
CS 70 Memory Diagrams
- We use memory diagrams to help trace the correctness of C++ code.
- In our diagrams, we draw a variable's
- value "inside the box" since it exists at runtime.
- name, type, and location "outside the box" because they are useful for tracing the code, but they are not explicitly represented in memory at runtime.
- Our model abstracts away other details of specific computers (such as registers, the specific size of different numeric types, and the real naming conventions for memory slots) because those details vary from one computer to another, and are not helpful for modeling correctness.
C++ Guiding Principles
- Programming languages are designed by particular people, for particular purposes, in particular contexts.
- We discussed two principles that are important for explaining some of the design choices in C++ and let us reason about how the language behaves.
- The Zero-Overhead Principle: Don't slow programs down “needlessly”.
- Providing a lanuage feature should never slow down a program that doesn't use that feature.
- Generally do things in the quickest possible way (even at the expense of safety or convenience).
- The “Power to the People” Principle: Provide language features that allow expert programmers to do things in unsual ways.
- Sometimes beginners accidentally use featues intended for experts!
- The Zero-Overhead Principle: Don't slow programs down “needlessly”.
- Both of these features mean we should not ignore compiler warnings when it notices we're doing something questionable!
- Welcome!
- Introductions to the instructors
- Course goals and policies
- The CS 70 Animal Friends
- Variables in Python & C++
- Similarities and differences between variables in Python and C++
- A First Look at C++
- First C++ program example
- Running code in Online GDB
- Memory Diagrams in CS 70
- CS 70 memory diagrams for modeling C++ programs
- First draft of rules for memory diagrams
- Modeling Memory
- Comparison of CS 5 and CS 70 memory models
- Von Neumann architecture
Lesson 2 — Object Lifetime, Numeric Types, Promotion and Conversion
Key Points
Object Lifetime
- Every variable goes through the same 5 phases of object lifetime.
- The space for a local variable is allocated at the opening curly brace of the function where it's declared.
- The value of a variable is initialized on the line with its declaration.
- Once a variable's value is initialized, it is in use. For primitive values, that primarily means that its name is in scope, which means the name can be used in other expressions.
- The value of a variable is destroyed at the closing curly brace of the declaring block.
- Once a variable's value is destroyed, it is no longer in use. For primitive values, that primarily means that its name is out of scope, which means that the name cannot be used in other expressions.
- The space for a local variable is deallocated at the closing curly brace of the function where it's declared.
Numeric Types
- Numeric values in C++ can either be floating point (able to hold non-integer parts, like decimals) or integers (only able to hold integer values).
- Integer values in C++ can either be signed (able to hold positive and negative values) or unsigned (only able to hold non-negative values).
- Types like booleans and characters are numeric types in C++. A boolean, for example, is an integer type that is guaranteed to be able to hold two values.
Numeric Promotion and Conversion
- If a floating point is used where an integer type is needed, that is always conversion since the non-integer part of the value may be lost.
- If an integer type is used where a floating point is needed, that is always conversion since there may not be enough bits allocated to storing the integer part of the value.
- If a signed value is used where unsigned type is needed, that is always conversion since negative values cannot be correctly represented.
- If an unsigned value is used as a signed type of the same size is needed, that is always conversion since there may not be enough bits allocated to storing the magnitude of the value.
- If a value from a larger numeric type is used where a smaller numeric type is needed, that is always conversion since there may not be enough bits to store the value.
- If a value from a smaller numeric type is used as a larger numeric type of the same signedness is needed, that is always promotion since there will definitely be enough bits to store the value.
- The Five Stages of an Object's Life
- Object lifetime in C++ - allocation, initialization, use, destruction, deallocation
- Loops and Conditionals with Object Lifetime
- Object lifetime with loops and conditionals in C++
- Numeric Types—Fundamentals
- Fundamentals of numeric types in C++
- integers vs floating point
- signed vs unsigned
- Fundamentals of numeric types in C++
- Promotion and Conversion
- Promotion and conversion between numeric types in C++
- Practical Advice for Numeric Types
- Practical advice on choosing and using numeric types in C++
Week 2
Lesson 1 — Compilation, Version Control, and Pair Programming
Key Points
Compilation
- All of the C++ code we write will go through the following steps:
- First, the compiler changes source code in to assembly code.
- Next, the assembler changes the assembly code into object code.
- Finally, the linker changes the object code into machine code.
- The compiler deals with all of the parts of variables that are not actually stored in memory at run time.
- Specifically, that means that errors related to a variable's type or name will be caught at compile time!
- The (simplest) commands to compile a source file
myprogram.cpp
into an executable namedmyprogram
and then run that program areclang++ -c myprogram.cpp
(compilation and assembly)clang++ myprogram.o -o myprogram
(linking)./myprogram
(running the program)
- (In the homework we will typically have additional command-line options.)
Version Control
- A version control tool lets us keep track of all the different versions of our files, without having them clutter up our workspace.
- The version-control tool you will use this semester is called
git
. - We also use a website called GitHub to have a shared, web-accessible place to put copies of everyone's repositories.
Pair Programming
- Pair programming is a skill. Like any skill, it requires practice and intention to get better.
- Our job this semester, collectively, is to support each other in developing that skill!
- What is Compilation?
- What is compilation
- Stages of compilation (preprocessing, compilation, assembly, linking)
- Compiler translates high-level code to low-level machine instructions
- Brief history of C/C++ and Unix
- Compilation Commands
- Compilation commands to generate object file and executable
- clang++ options for compiling and linking
- Running the executable
- Compiler Responsibilities
- Compiler handles types
- Compiler handles names
- Compiler checks syntax
- Version Control
- Version control systems
git
and GitHub- Tracking file changes
- Thinking About Pair Programming
- Pair programming skills and mindset
- Academic integrity and pair programming
Lesson 2 — Arrays, Style, and Program Correctness
Key Points
Arrays of Primitives
- Arrays are
- Fixed-size — Once they're created, we can't change their size.
- Homogenous — All of the elements in them have to be the same type.
- Contiguous — If we looked at an array in memory, we'd see all of the elements side by side.
- Ordered — We can talk about what the first, third, or forty-second element of the array is.
- Arrays allow constant-time access: We can jump straight to an arbitrary element of the array.
- In terms of object lifetime, arrays act just like individual variables. The compiler needs to know ahead of time how big the array will be, so that it can write the instructions to allocate enough space to store the array.
- Declare and initialize an array:
int arr[3]{1, 2, 3};
- Usage: To access the item at index
i
:arr[i]
arr[i]
is translated to*(arr + i)
*x
means the value in memory at the address given byx
arr + i
means the address at an offset ofi
from the address given byarr
- So
*(arr + i)
means the value in memory at the address of the item at offseti
from the start of the array.
- Destruction and deallocation:
- When the array is destroyed, each item is destroyed.
- When the array is deallocated, all items are deallocated.
- Declare and initialize an array:
- Accessing an array out-of-bounds causes undefined behavior.
- Undefined behavior means your program has gone off the rails, anything could happen, from something catastrophic to everything seeming to work.
- You should never write a program that you know has undefined behavior just because nothing bad happened when you tested it.
- Typical C++ compilers generate code that does things the fastest and easiest way, without any special safety checks.
- That includes finding the place in memory where we're claiming an item should be and interpreting the bits at that location as being of the appropriate type (that you asked for).
- Accessing an array out-of-bounds can cause strange behavior. While, in principle, anything can happen when code causes undefined behavior, the strange behaviors you might see can vary between systems. But you can learn to recognize problematic behaviors, especially on the systems you use for coding and testing.
- Accessing memory that is way out-of-bounds can cause your program to crash.
- The operating system will not allow your program to access memory outside of its allotted region.
- This error is called a segmentation fault (segfault for short).
Code Style
- The goal of our code should always be to explain our thinking to other people who read the code.
- Similar things should look similar and different things should look different.
Testing
- When we write tests for our code, our goal should be to reveal bugs, not just demonstrate that our code "works".
- When you write tests, think about what edge cases you need to consider to make sure that your implementation is correct.
- If you make a mistake in your own code that results in "wrong" behavior, write a test for it! While we encourage test-first development, you can always add to your tests to make them more thorough as you develop a deeper understanding of where the tricky parts are in a particular coding task.
- Arrays, Our First Data Structure!
- Properties of arrays: contiguous, ordered, homogeneous, fixed-size
- Advantages/disadvantages of arrays
- Declaring Arrays in C++
- Declaring arrays in C++
- Specifying size, initialization
constexpr
for non-variable sizes
- Accessing Array Elements
- Array variable holds address of first element
- Indirection/dereference operator (
*
) - Indexing with offset (
[]
) - Changing array elements
- Out-of-Bounds Access
- Undefined behavior from out-of-bounds access
- Typical unsafe array access behavior
- Segmentation faults from accessing invalid memory
- Array Example
- Object lifetime for arrays
- Walkthrough of array example code
- Style and Elegance
- Code style and consistency
- Naming conventions
- Whitespace and indentation
- Idioms and community expectations
- Program Correctness Matters
- Purposes and limitations of testing
- Real-world bugs and consequences
- Test goals: revealing bugs, not proving correctness
Week 3
Lesson 1 — Multiple Files, Forward Declarations and Include Guards
Key Points
- We split our code into multiple files to avoid code duplication.
- A multifile program contains:
- One or more source files (
.cpp
), exactly one of which has amain
function. - One or more header files (
.hpp
), which declare functions that are defined in the source files (.cpp
).
- One or more source files (
-
Make sure that every header file (
.hpp
) has an include guard.#ifndef FILENAME_HPP_INCLUDED #define FILENAME_HPP_INCLUDED ...stuff... #endif
- Include guards prevent multiple definitions when a header file is included more than once in the same source file.
- Each preprocessor macro we use for an include guard must be unique (otherwise multiple files will be trying to use the same guard and only one of them will be seen), which is why we usually use the filename as part of the name.
-
Make sure that each file uses
#include
to include any header files (.hpp
) that declare things it uses.- That often includes a source file's own header file!
- Sometimes header files need to include other header files!
- Use
#include "
filename
"
for your local files from the current directory. - Use
#include <
filename
>
for system include files
- To compile your project:
- Compile each source file (
.cpp
) into an object file (.o
) usingclang++ -c filename.cpp
.- In practice, you'll probably want other flags too, such as
-std=c++17
,-g
and-Wall -Wextra -pedantic
. - (Remember from last week that this phase includes preprocessing, compiling the source file into assembly code, and then assembling the assembly code into machine code.)
- In practice, you'll probably want other flags too, such as
- Link object files (
.o
) together into an executable usingclang++ -o executableName file1.o file2.o ...
- Compile each source file (
- To recompile:
- If a source file (
.cpp
) changes, recompile it into an object file (.o
) and relink as necessary. - If a header file (
.hpp
) changes, recompile all source files that include it or that include a file that includes it, and so on.
- If a source file (
- Rules of thumb:
- Don't compile a header file (
.hpp
). - Don't
#include
a source file (.cpp
).
- Don't compile a header file (
- Why Multiple Files?
- Motivation for splitting code into multiple files
- Compiling and linking multiple files
- A Multifile Program
- Forward declarations
- Compiling vs linking
#include
Directives- The C++ preprocessor
#include
directives- Header files for the standard library
- Header Files
- Header files
- Forward declarations
- Another Problem
- Problem: multiple declarations from multiple includes
- Include Guards
- Include guards
#define
,#ifndef
,#endif
- Preventing multiple definitions
- Should a Source File Include Its Own Header File?
- Whether a source file should include its own header (It should!)
- Error Messages
- Interpreting compiler errors
- Header Files and Recompilation
- Header file changes and recompilation
Lesson 2 — References
Key Points
- A variable with a reference type (e.g.,
int&
) is just another name for an existing variable (in this case anint
). - An important use of references is as function parameters:
- Allows the function to alter data outside of its stack frame.
- Prevents unnecessary copy operations.
- The
const
keyword specifies that changes are prohibited (i.e., we have read-only access to the data).- The
const
keyword affects what we are allowed to do via a particular name, not what others might be able to do via a different name, so the same piece of data can haveconst
names that cannot be used to change it and non-const
names that can. - You cannot initialize a non-
const
reference (e.g., of typeint&
) with aconst
name (e.g., of typeconst int
). That would circumvent theconst
restriction! - A
const
-reference function parameter is a good way to avoid needless copying but still promise not to change the given argument.
- The
- Reference Types
- References in C++
- Rules for visualizing references
- References: Exercise 1
- Visualizing simple code with references
- References Exercise 2: Assigning to a Reference
- Assigning to a reference
- References Exercise 3: References to References?
- No references to references in C++
- Example: References as Function Parameters
- References allow a function to alter data outside its own stack frame.
- Warning: C++ Is Not Java
- Differences between C++ and Java references (which are actually more like C++ pointers!)
- References Exercise 4: Multiple “Return” Values via References
- Using references to return multiple values from a function
- const References
const
references are read-only references
- References Exercise 5: Find the Compiler Errors
- Finding compiler errors with references
- Example: Why const References?
- Motivation for
const
references as parameters
- Motivation for
Week 4
Lesson 1 — Classes and Objects
Key Points
Recap Video
There was a fair amount to digest in this lesson. As a review, and because sometimes it's helpful to see things presented a different way, here's a video that goes over what we've covered in this lesson. If you feel like you've already mastered the material, you can skip it.
MORE videos! I'm excited!!
I'm pretty sure I've understood, but sometimes it's comforting to watch a video that tells me things I think I know.
Key Points
- Classes are conceptually similar to what you've seen before, but have syntactic/terminological differences.
- Instead of “instance variables” and “methods” we say “member variables” or “data members” and “member functions” (“members” to refer to them all together).
- Declarations/definitions
- The class definition goes in the header file (
.hpp
). It declares all of the class members.- In CS 70 convention we name all member variables with a trailing underscore (e.g.,
age_
).
- In CS 70 convention we name all member variables with a trailing underscore (e.g.,
- The member-function definitions go in the source file (
.cpp
). They specify the instructions for each function. - When we define a member function we have to specify which class it belongs to using the scope-resolution operator (
::
):- For example,
void Cow::moo(int numMoos);
.
- For example,
- The class definition goes in the header file (
- A member function can be declared
const
.- A
const
member function promises not to change the values of member variables. - If you have a
const
object, you can only callconst
member functions on it.
- A
- We looked at two different kinds of constructors:
- Parameterized constructors take parameters and must be invoked explicitly (e.g.,
Cow bessie{3, 12}
). - Default constructors take no parameters and are implicitly invoked for default initialization (e.g.,
Sheep fluffy;
).
- Parameterized constructors take parameters and must be invoked explicitly (e.g.,
- You can disable the use of a default constructor using the
delete
keyword (e.g.,Cow() = delete;
). - In a constructor, the member-initialization list specifies how to initialize each member variable.
- For example,
Cow(int numSpots, int age) : numSpots_{numSpots}, age_{age}
. - In CS 70, you must use a member-initialization list whenever possible. (The only exception is initializing a primitive array when you want a copy of another array—for that case, you need to loop over the elements of the array you're copying to populate your new array.)
- For example,
- A class definition ends in a semicolon (
;
).- A class definition ends in a semicolon.
- A class definition ends in a semicolon.
Did we mention that a class definition ends in a semicolon?
- Before You Start
- Difference between header (.hpp) and source (.cpp) files
- Exploring a C++ Class
- Overall structure of a C++ class
- Similarities and differences from classes in other languages like Java and Python
- C++'s Class-Related Terminology
- C++ terminology for classes and objects
- Declarations/Definitions
- Header files contain class interface (declarations)
- Source files contain implementations (definitions)
#include
the header to use the class
- Member Variables (a.k.a. Data Members)
- Member variables (data members)
- Naming convention with trailing underscore
- Accessing member variables
- Member Functions
- Declaring and defining member functions
- Calling member functions
- Scope resolution operator
::
- What is a Namespace?
- Namespaces in C++
- Avoiding name collisions with namespaces
std
namespace andusing namespace std;
- const Member Functions
const
member functions- Promise not to modify member variables
- Allowed on const objects
- Placement of
const
keyword
- Exercise: const Member Functions
- Identifying which member functions should be
const
- Calling
const
and non-const
member functions onconst
objects
- Identifying which member functions should be
- Constructors
- Constructors in C++
- Purpose is to initialize object
- Parameterized vs. default constructors
- Exercise: Constructors
- Valid and invalid ways to invoke constructors
- Constructing Objects
- Object initialization syntax in C++
- Lack of assignment operator
- Curly brace initialization
- Objects in Our Memory Model
- Drawing objects in memory diagrams
- Member Initialization Lists
- Initialization of member variables
- Member initialization list syntax
- Ordering rules
- Access Control
- Public and private access control in C++ classes
- Placement of
public
/private
keywords
- Semicolon at the End!
- Class definitions end with a semicolon
Lesson 2 — Object Lifecycle in Detail (Constructors, etc.)
Key Points
- Object lifetime on the stack: when?
- Allocation: at the opening
{
of the function - Initialization: at the declaring line
- Use: between initialization and destruction
- Destruction: at the closing
}
of the declaring block - Deallocation: at the closing
}
of the function
- Allocation: at the opening
- Default initialization/construction
int x;
Cow mabel;
Cow bessie{};
- For an object, invokes the default (parameterless) constructor
- Copy initialization/construction
- Creates a new object that is a copy of an exiting object
int x = y;
int a{b};
Cow mabel = adeline;
Cow bessie{fennel};
- For an object, invokes the default constructor (which takes a reference to an object of the same type)
- Also used to initialize function parameters and return values
- Assignment operator
- Changes an existing object to be a duplicate of another existing object
s = t;
cadewyn = ethyl;
- For an object, invokes the assignment operator, a special member function called
operator=
.
- Destructor
- A special member function that is automatically invoked when an object is destroyed
- For class
Cow
, the destructor is named~Cow
- Typically used to “clean up” or release any resources the object is holding onto
- The compiler can synthesize or disable these functions:
Cow(); // Means that the programmer will define the default constructor
Cow() = default; // Means that we will use the compiler's synthesized default constructor
Cow() = delete; // Means that this class does not have a default constructor
- Same for default constructor, copy constructor, assignment operator, and destructor (but don't disable the destructor!!)
- For arrays, the same as above, only \( N \) times, where \( N \) is the size of the array.
- For references, initialization and destruction mark the usage period of the name
- We don't model any allocation or deallocation for references
- For data members of a class
- Allocation happens when the object is allocated
- Initialization happens before the opening
{
of the constructor (so use the member-initialization list!) - Destruction happens at the closing
}
of the destructor - Deallocation happens when the object is deallocated
- Before You Start…
- The five phases of object lifetime
- Objects on the Stack
- Timing of each phase for stack variables
- All About Initialization
- Initialization of primitive vs. class types
- Default vs. direct vs. copy initialization
- Copy constructors
- Function parameters and return values
- Synthesized Constructors
- Synthesized default and copy constructors
- Specifying synthesized, programmer-defined, or deleted constructors
- Use Phase
- Assignment operators
- Synthesized assignment operators
- Assignment vs. copy initialization
- Destruction
- Purpose of destruction phase
- Destructors in C++
- Naming convention
- Purpose is to clean up / release resources
- Synthesized destructors
- Arrays
- Object lifetime for arrays
- References
- Object lifetime for references
- Data Members
- Object lifetime for data members
- Putting It All Together
- Comprehensive tracing of object lifetime in an example program
Week 5
Lesson 1 — Pointers, the indirection operator (*
), and reading types
Key Points
Pointers in General
- Pointers are a kind of value that represents the memory address of a value of a particular type.
- Example:
int* x
means thatx
will hold a pointer to anint
. A pointer to anint
(a.k.a. "int
pointer") is the address of a piece of memory that purports to contain anint
. - We can access the data a pointer points to by following it using the indirection operator:
*x
is another name for the data thatx
points to.- Note: Sometimes people with a C background say "dereference" instead of "follow" (and call
*
"the dereference operator"). That term is a bit less clear, especially given that it has nothing to do with C++ references. We're trying hard to stick to the C++ terminology in this class (although we might slip up occasionally!).
- Note: Sometimes people with a C background say "dereference" instead of "follow" (and call
- We can read types involving pointers (and more) using the inside-out rule.
- Example:
- Pointers are primitive types in C++.
- There are infinitely many pointer types, because for every (non-reference) type, we can add a
*
to the end of the type to create a pointer for that type. - Pointers support
const
for setting read-only status. We can useconst
on the pointer itself, the value the pointer points to, or both. - If we let a pointer be default initialized, it will point to an indeterminate value. (From a "did you follow the rules" perspective, a default-initialized pointer is invalid and inappropriate to follow; doing so leads to undefined behavior.)
- There are infinitely many pointer types, because for every (non-reference) type, we can add a
- Pointers are useful!
- We can pass arrays into functions using pointers, with either
- A first-element pointer and a size; or
- A first-element pointer and a past-the-end pointer (which is C++'s preferred way).
- We can also use pointers to avoid moving or copying data, by rearranging the pointers-to-data rather than rearranging the data itself.
- We can pass arrays into functions using pointers, with either
- Every piece of data has a memory address.
- We can get the address of a variable using the address-of operator:
&
. - Example: if we have
int x
, then&x
is the address ofx
. - (Not to be confused with
&
as part of a type:int& r
is a reference to anint
, not the address of theint
.) - We almost never need to use the address-of operator.
- We can get the address of a variable using the address-of operator:
Pointers and Instances of Classes
- In a member function,
this
is a pointer to the object that the member function was invoked on.- We almost never need to use
this
; when we do, it's usually as*this
.
- We almost never need to use
- If we have a pointer to an instance of a class, there are two ways to access the members of the object:
- Follow the pointer, then access:
(*myObj).myMemberFunction();
- Follow and access together (preferred):
myObj->myMemberFunction();
- Follow the pointer, then access:
I bet there's even MORE to pointers!
There is! And that's the next lesson.
- Before You Start
- Review of array syntax like
*(arr + i)
- Review of array syntax like
- What's the C++ Type for a Memory Address?
- Pointers store memory addresses
- Pointer types like
int*
- Getting pointers to array elements
- Past-the-End Pointers
- Past-the-end pointers
- Alternative to passing array size: passing past-the-end pointer
- Read-Only Access
- Using
const
with pointers to prevent modification
- Using
- Saving Time by Not Moving/Copying Data
- Avoiding copying large objects by rearranging pointers instead of objects themselves
- Not Just Arrays
- The address-of operator
&
to get pointer to variable - Difference between references and pointers
- The address-of operator
- The this Pointer in Member Functions
- The
this
pointer in member functions
- The
- Reading Complicated Types
- The inside-out rule for reading complex types with pointers, references, arrays, etc.
Lesson 2 — Dynamic allocation (new
and delete
) and memory-management issues
Key Points
- The heap is a region of memory for things that we explicitly allocate and deallocate when we want to.
- The
new
keyword allocates space, initializes the data, and returns the address of the new data.- Allocate one thing:
int* p = new int{42};
. - Allocate an array of things:
int* arr = new int[42]; // arr is a pointer to the first element
.
- Allocate one thing:
- The
delete
keyword takes the address of something on the heap, destroys the data, and deallocates that chunk of memory.- Delete one thing:
delete p;
. - Delete an array of things:
delete[] arr;
. - All variants of
delete
can only be given pointers to heap memory that came fromnew
. You can't delete anything else (other memory, parts of things, etc.).
- Delete one thing:
- Every call to
new
should have a matching call todelete
later on.- Non-arrays allocated with
new
should have be deallocated withdelete
- Arrays allocated with
new ... []
should be deallocated withdelete[] ...
. - Don't call the wrong one!
- Non-arrays allocated with
- The
- Types of allocation:
- Static allocation: memory that is allocated for the entire duration of the program (e.g. a
static
member variable). - Automatic allocation: memory allocated on the stack (because it is allocated/deallocated automatically when functions begin/end).
- Dynamic allocation: memory allocated on the heap (because the size of the allocation is "dynamically" determined at runtime as opposed to compile time).
- Static allocation: memory that is allocated for the entire duration of the program (e.g. a
- A memory leak is when something is allocated on the heap and becomes inaccessible (all pointers/references to it are destroyed).
- In our memory model, leaks look like data on the heap with no valid name.
- Once memory has been leaked, the program can't deallocate it!
- Prevent memory leaks:
- For every
new
that occurs during the program, there must also be exactly onedelete
. - When you use
new
, consider also writing the correspondingdelete
statement wherever it will need to be.
- For every
- Other common memory-management errors:
- Double delete: Trying to delete the same data twice.
- Most commonly caused by using
delete
on two pointers that both point to the same data.
- Most commonly caused by using
- Dangling pointer: Following a pointer to memory that has been deallocated.
- Most commonly caused by continuing to use a pointer after the thing it points to has been
delete
d.
- Most commonly caused by continuing to use a pointer after the thing it points to has been
- Double delete: Trying to delete the same data twice.
- Before You Start
- Writing functions that take pointer+size or pointer+past-the-end pointer to arrays
- Elephant Parade, Part 1: A Bad Idea
- Motivating example: limitations of fixed-size arrays
- The Heap and new
- The heap for dynamic allocation
new
for allocating+initializing on heap
- Null Pointers
- Default-initialized pointers
- Using
nullptr
for invalid/null pointers
- Elephant Parade, Part 2:
new
andnullptr
- Example using
new
andnullptr
- Example using
- Elephant Parade, Part 3: Dynamically Allocated Arrays
- Dynamically allocating arrays with
new[]
- Dynamically allocating arrays with
- Elephant Parade, Part 4: Memory Leaks
- Memory leaks
- The delete Keyword
delete
anddelete[]
for deallocating heap memory- Need matching delete for every new
- Elephant Parade, Part 5: Dangling Pointers and Double Deletes
- Dangling-pointer errors
- Double-delete errors
Week 6
Lesson 1 — Lifetime on Heap; Destructors, Copy Constructors and Assignment Operators
Key Points
- Sometimes a class needs to manage memory on the heap.
- We allocate that memory using
new
, either in its constructor(s) or in other member functions. Thenew
operation- Allocates memory on the heap (requesting it from the system's heap manager).
- Constructs the object (or objects) in that space.
- Returns a pointer to the memory. We'll need to eventually give this pointer to the appropriate
delete
.
- Our classes own their resources, therefore the class also owns the memory it allocates on the heap—in other words, it's responsible for that memory.
- In our classes, ownership is exclusive. If two objects both think they own the same data, it's a bug.
- We allocate that memory using
- A class's destructor must release all memory in the heap that the object is responsible for.
- The synthesized destructor will simply destroy each data member of an instance of a class.
- If a data member is a pointer that is storing the location of memory on the heap, that memory will not be destroyed and deallocated when the pointer is destroyed.
- Failing to deallocate heap memory that our object is responsible for can cause memory leaks, where an object is gone, but the dynamically allocated data that it owned is still occupying memory but is inaccessible.
- We release heap memory via a
delete
statement, which- Takes a pointer (that came from
new
). - Destroys the object at that memory address.
- Deallocates the memory (giving it back to the system's heap manager).
- Takes a pointer (that came from
- A class's copy constructor should be sure to create a deep copy by allocating new heap memory for a new instance of a class.
- The synthesized copy constructor will simply copy initialize or construct each data member (shallow copy).
- If one of the data members is a pointer to heap memory, then two objects will both point to the same heap memory!
- This situation can cause all kinds of trouble, including double-delete errors, if they both objects try to
delete
that memory.
- A class's assignment operator should be sure to clean up "old" memory in an instance of a class, then allocate new heap memory for the new configuration of the class instance.
- The copy-and-swap idiom is a clever/elegant way to do this.
- A less elegant but more direct way is to first use code from the destructor, then use code from the copy constructor.
- Alternatively, if we don't think a copy constructor or assignment operator is needed, we can instead choose not to provide them. (Or, during development we can disable them, and then write them last once everything else is done.)
- Before You Start
- Practice tracking what happens during construction.
- Defining a Class
- Defining a class with public and private access
- Defining constructors and member functions in a class
- Implementing a Class
- Implementing constructors and member functions for a class
- Using member initialization lists
- When Classes Allocate Memory…
- Classes allocating memory on the heap
- Using pointers to refer to heap-allocated memory
- Initializing Heap Memory: Constructors
- Initializing heap memory in constructors
- Allocating arrays on the heap with new
- Array Elements That "Don't Have a Cow"
- Having array elements that don't contain objects
- Using an array of pointers instead of objects
- Initializing Heap Memory in Member Functions
- Initializing heap memory in member functions
- The principle of object ownership in C++
- Revisiting the Default Constructor
- Writing a default constructor for a class that allocates memory
- Using delegating constructors
- Destruction for Custom Classes: Destructors
- Writing a destructor to free heap memory
- Order of freeing memory in destructors
- Providing a Copy Constructor
- Writing a copy constructor
- Avoiding shallow copies
- Making deep copies
- Assignment for Custom Classes: Assignment Operators
- Writing an assignment operator
- The copy-and-swap idiom
- Avoiding self-assignment issues
Week 7
Lesson 1 — More Memory Diagrams, References, and Pointers; lvalues vs. rvalues (temporaries); How functions return values; Returning references; Overloading.
Key Points
- There are lvalues and rvalues:
- In
x = y + z
,x
is an lvalue andy + z
is an rvalue. - lvalues have an obvious memory location and significant lifetime; they often have a name.
- rvalues are temporary values that exist for the duration of one line of code; they often don't have names.
- In
- Every function call that returns a value is actually initializing a new object in memory
- The compiler allocates temporary objects (rvalues) as needed (e.g., to hold intermediate results in nested expressions).
- Functions can return references.
- We can overload functions (i.e., define multiple functions with the same name that operate on different sets of arguments).
- C++ uses the types of the arguments of the function to figure out which function we mean to call. If selecting the best function for a particular set of arguments is ambiguous, it's an compile-time error.
- We can implement our own versions of C++'s existing operators for our own types, so we can make our types feel “built in”.
- Before You Start
- Review of constructing objects in C++, including default construction, copy construction, and temporary objects
- Review of allocating objects on the stack vs. the heap in C++
- Review of iterating through arrays in C++ using index variables and pointer arithmetic
- Passing Copies
- Drawing memory diagrams for C++ code
- Pass-by-value parameter passing in C++ functions
- Comparing hand execution to actual program output
- Passing References
- Pass-by-reference parameter passing in C++ functions
- Using references to avoid copying data
- Discovering Temporaries
- Temporary objects in C++ used to hold intermediate results
- Lvalues and rvalues in C++
- Taking addresses of temporary objects
- Returning Values
- How functions return values in C++ by initializing space allocated by the caller
- Drawing memory diagrams for function calls that return values
- Returning References
- Returning references from C++ functions
- Modifying existing variables by returning a reference to them
- When to Return References
- Guidelines for when to return references in C++
- Dangers of returning references to local variables
- Overloading
- Function overloading in C++
- Overloading operators like << in C++
- How C++ resolves overloaded functions
- Putting It All Together
- Example of a CheckList class in C++
- Implementing array-like behavior with operator[] overloading
Lesson 2 — Iterators, auto
, std::vector
, and std::list
.
Key Points
Here's a summary of what we've learned in this lesson.
Iterators
-
An iterator is an object that represents a “location” in a collection (a.k.a. container).
- It can get/set the item at that location, and move forward to the next location.
- Provides a unified interface for iteration in lots of data structures.
- Allows a data structure to define the best way to iterate through its items.
-
Other languages also have iterators, so they are a general concept; not C++ specific.
-
C++'s iterators are inspired by the begin/past-the-end scheme for iterating over an array.
-
In the C++ standard library, a container provides the following functions:
begin
—returns an iterator to the first item, andend
—returns an iterator "one past" the last item.- We use the end iterator to know when we've gone too far in a loop!
- It must be the case that…
- If
i
is an iterator to the last item inc
, then++i == c.end()
, and - A container
c
is empty if and only ifc.begin() == c.end()
.
- If
-
In the C++ standard library, an iterator offers the following operations:
-
*iter
— returns a reference to the item at the iterator's location.++iter
— advances the iterator to the next location.iter != otherIter
— returns true if the two iterators are not at the same location.
-
"Iterator" is not another word for "pointer".
- Both iterators and pointers abstractly represent a concept of “location” and allow access to the data at a location.
- A pointer is the iterator type for a primitive array.
- An iterator might have a pointer as a member to help it do its job.
- But an iterator object has data and member functions, whereas a pointer is just a primitive type representing a memory address.
-
Common idioms for looping through a collection using an iterator:
for (std::vector<int>::iterator iter = vec.begin(); iter != vec.end(); ++iter) { std::cout << *iter << std::endl; }
or
for (auto iter = vec.begin(); iter != vec.end(); ++iter) { std::cout << *iter << std::endl; }
or
for (int x : vec) { std::cout << x << std::endl; }
-
Read-only iterators are a distinct type where
operator*
returns aconst
reference rather than a non-const
reference. They're usually calledconst_iterator
rather thaniterator
.
I get the general idea, and I feel okay with using iterators, but I don't feel like I have any practice actually making new iterator classes.
That's right.
Don't worry, practice with building our own iterators is coming.
C++'s Sequence Containers
We also learned about std::vector
and std::list
in C++'s standard library. They are both examples of sequence containers (a.k.a. lists), and they're both generic in that they can hold a type of our choosing.
std::list<T>
is a doubly linked list with list items of typeT
.std::vector<T>
is an array-backed list with array elements items of typeT
and automatic resizing.
Both are full-featured list objects, with useful member functions well-suited to their underlying representations (a.k.a. encodings).
- Before You Start
- Review of the CheckList class from the previous lesson
- Begin and end member functions for iteration
- Identifying Operations
- Identifying operations like function calls and initializations in C++ code
- C++'s auto keyword
- The auto keyword in C++ for automatic type inference
- Guidelines for when to use auto
- A Simple Program Using Primitive Arrays
- Example C++ program using primitive arrays
- Limitations of fixed-size primitive C++ arrays
- A Simple Program Using std::vector
- Example C++ program using std::vector
- Features of std::vector like size() and push_back()
- Dynamic resizing behavior of std::vector
- A Simple Program Using std::list
- Example C++ program using std::list
- Comparison to std::vector
- Iterators returned by std::list
- What Is An Iterator?
- Definition of an iterator in C++
- Iterators vs. pointers
- Operations required to implement an iterator
- Hiding the Iterator
- Simplified for-range loops in C++
- How they use iterators behind the scenes
- Read-Only Iterators
- const_iterators in C++ for read-only access
- Overloading begin/end to return const_iterators
Lesson 3 — Asymptotic Complexity and Best-, Worst-, and Expected-Case Analyses.
Key Points
Empirical timing
- Tells you how long a program actually took, which is great.
- But the findings are specific to the experimental conditions (compiler options, computer, inputs tested, etc.).
- So we have to be careful how we extrapolate the findings to other conditions.
Operation counting
- Give the number of times a particular operation (or set of operations) occur as a function of some aspect of the input.
- Abstracts away all system-level issues (compiler, CPU, etc.)
- Allows us to weigh trade-offs in constant factors and lower-order terms.
- Requires us to make choices about what to count and what to account for in the input.
- Can be unnecessarily tedious/detailed for some questions.
Asymptotic analysis
- Answers the question “How well does this algorithm scale?” (e.g., what happens if we double the input size)
- Technically, is focused on how the cost grows as the size of the input goes to infinity.
- As a result we ignore constant factors and lower-order terms.
- In practice conclusions about how costs grow are usually applicable to merely “large” sizes, not just infinite ones.
- Technically, is focused on how the cost grows as the size of the input goes to infinity.
- Must useful when costs are fundamentally different (e.g., \(\Theta(n\) vs \(\Theta(n^2\)) rather than when they appear the same.
- \(f(n) \in \Theta(g(n))\) means that \(f\) and \(g\) are asymptotically similar.
- \(f(n) \in \Theta(g(n))\) if and only if 0 < \(\lim_{n\rightarrow \infty} \frac{f(n)}{g(n)} < \infty\)
- Usually we compare a cost function to simple reference functions, sometimes called orders of complexity:
- \(\Theta(1)\) — constant time
- \(\Theta(\log n)\) — logarithmic time
- \(\Theta(n)\) — linear time
- \(\Theta(n \log n)\) — linearithmic time
- \(\Theta(n^2)\) — quadratic time
- etc.
- \(f(n) \in \textrm{O}(g(n))\) means that \(f\) is "no worse than" \(g\) in an asymptotic sense.
- \(f(n) \in \textrm{O}(g(n))\) if and only if \(\lim_{n\rightarrow \infty} \frac{f(n)}{g(n)} < \infty\)
- Example usage:
- "In the worst case the complexity is \(\Theta(n)\) and in the best case it is \(\Theta(1)\), so the complexity is \(O(n)\)."
- The asymptotic notation family:
- \(f(n) \in \textrm{o}(g(n))\), kind of like \(x < y\).
- \(f(n) \in \textrm{O}(g(n))\), kind of like \(x \le y\).
- \(f(n) \in \Theta(g(n))\), kind of like \(x = y\).
- \(f(n) \in \Omega(g(n))\), kind of like \(x \ge y\).
- \(f(n) \in \omega(g(n))\), kind of like \(x > y\).
Best case analysis
- What is the smallest number of operations for an input of size \(n\)?
Worst case analysis
- What is the largest number of operations for an input of size \(n\)?
Expected case analysis
- What is a "typical" number of operations?
- Three steps:
- Break the outcomes into cases so that each case has the same complexity
- Assign probabilities to the cases
- Usually we assume a probability model that we think will help us understand the algorithm in practice.
- A very common probability model is the "uniform probability" model which assumes every case is equally likely.
- Find the expected complexity
- A weighted sum of the complexities of the cases, weighted by probability.
- \(\sum_{k} Pr(\textrm{Case}\ k)\textit{Cost}(\textrm{Case}\ k)\)
- Before You Start…
- Warm-up example analyzing a simple C++ loop
- Empirically Timing Programs
- Empirically timing programs to measure performance
- Limitations of empirical timing results
- Instruction Counting?
- Counting CPU instructions executed by a program
- Challenges with instruction counting
- Operation Counting
- Counting operations like comparisons in an algorithm
- Expressing the operation count as a function of input size
- Nested Loops
- Analyzing nested loop algorithms by counting operations
- Applying summation identities like Gauss's formula
- Best- and Worst-Case Analysis
- Best case and worst case analysis of algorithms
- Finding asymptotic bounds for an example algorithm
- Expected Case Analysis
- Steps for expected case analysis
- Finding the expected complexity of an example algorithm
- Asymptotic Complexity Analysis
- Definition of asymptotic analysis
- Using limits to compare growth rates of functions
- Big Θ
- Definition of big Theta notation
- Comparing two cost functions using big Theta
- Θ Abstracts Away Coefficients
- How big Theta ignores constant factors
- Θ Abstracts Away the Choice of Operation
- How big Theta lets us consider overall complexity, not one operation
- Θ Abstracts Away the Choice of n (Mostly)
- Guidelines for choosing n in an analysis
- How choice of n often doesn't affect big Theta
- Case Study: Linear Search
- Analyzing best, worst, expected case complexities of linear search
- O and Ω
- Definition and usage of big O and big Omega notation
- Relationship between big O, Omega, and Theta
- The Whole Asymptotic Family
- Little o and little omega notation
- The entire asymptotic notation family
Week 8
Lesson 1 — Interfaces, Encodings, and Implementation; Iterator Validity
Key Points
Levels of a Data-Structure Specification
- Interface: the operations it offers
- In C++, the
public
members of a class- Encoding: the data the structure uses to do its job
- In C++, the
private
members of a class
- In C++, the
- Implementation: the actual procedures that manipulate the data to satisfy the interface promises
- In C++, the member-variable definitions of a class
- Encoding: the data the structure uses to do its job
Multiple Encodings for the List Interface
- Array-backed lists
- Contents stored in an array, which resizes as necessary
- Good at random access
- Bad at inserting/removing at or near the start of the list
- Linked lists
- Contents in list nodes, linked in a chain
- Good at inserting/removing at beginning
- Bad at random access
Iterators
Iterator Misuse: Obey These Rules
- Don't use
*
on a past-the-end iterator. - Don't use
++
on a past-the-end iterator. - Don't use
--
on an iterator at the beginning (there is no such thing as a before-the-beginning iterator). - Don't do anything to an iterator that isn't valid other than destroy it or assign a new value to it.
- Don't compare iterators from different origins.
Valid and Invalid Iterators
The interface for a data structure gives rules for when iterators will stay valid and when they will cease to be valid.
-
Universally applicable reasons an iterator would be valid but not refer to an item (we can't call
++
or*
on these iterators, but we can test them against other iterators from the same origin):- The iterator was default constructed, rather than initialized with a valid value.
- The iterator is past the end.
-
Universally applicable reasons an iterator would be not be valid:
- The underlying container is destroyed/deallocated.
-
Container-specific iterator validity rules
- For containers with an
erase
member function,erase
ing the item referred to by an iterator means that the iterator passed toerase
is no longer valid, and also that any and all other iterators referring to the erased location are no longer valid. - Sequence data structures based on arrays typically don't promise that iterators will continue to be valid after an item is added or removed (so the data structure can resize the array, which moves all of its data to somewhere else in memory when it copies the array).
- In contrast, sequence data structures based on linked lists typically do preserve iterators as much as possible, except for iterators that refer to items that have been removed.
- For containers with an
- Before You Start
- Ways to iterate through a list and modify values
- Using auto and range-based for loops
- Practice: A List Interface
- Interface vs implementation
- Fundamental operations for a string list data structure
- What's an Encoding?
- Definition of encoding (representation) of a data structure
- Specifying “The Rules”
- Interface, encoding, and implementation specify promises and rules
- Example of constraints on encoding imposed by interface (constant time indexing)
- Iterator Rules
- Rules for using iterators in C++
- Don't dereference or increment past-the-end iterator
- Only Use Valid Iterators
- Only use valid iterators, not invalid/dangling ones
- Example of iterator invalidation when container is deleted
- Example: An Iterator-Validity Violation
- Iterator invalidation when temporary container is destroyed
- Example Two: Danger When Erasing from a List
- Erasing element invalidates iterator passed in and other iterators to that element
- Container-Specific Rules
- Iterator validity rules specific to array-backed vs linked lists
- Insertion/removal invalidates iterators with array-backed but not always with linked lists
Week 9
Lesson 1 — Introduction to Basic Binary Search Trees
Key Points
- A binary search tree is
- A binary tree such that…
- The key in every node is bigger than all keys in its left subtree and smaller than all keys in its right subtree.
- A BST is designed for search:
- Applications where we need to efficiency store and find data.
- Associative containers that associate keys with values (like Python dictionaries).
- BST lookup:
- Compare the key to the current node.
- If the same, we're done! If smaller, go left. If larger, go right.
- BST insert:
- Use a similar lookup process to find a spot at the bottom of the tree for the new key.
- Connect the new node to the bottom of the tree.
- BST delete:
- Use lookup to find the node with the key.
- If no children: just remove it!
- If one child: move the child up to the node's place.
- If two children: replace the node with the largest key smaller than the one being removed (the rightmost descendant of the left child).
- Complexity of Trees (Balanced and Unbalanced):
- Insert and delete complexities depend on lookup.
- Lookup complexity is \( \mathrm{O}(h) \), where \( h \) is the height of the tree.
- Height can vary depending on insertion order!
- Stick: \( h \in \Theta(n) \).
- Well-balanced: \( h \in \Theta( \log{n} ) \).
- So complexity of all these operations:
- Stick: \( \mathrm{O}(n) \).
- Balanced: \( \mathrm{O}( \log{n} ) \).
- Overall (if there's no guarantee of balance): \( \mathrm{O}(n) \).
- Recursive helper functions:
- Member functions usually operate on a specific instance of a class (e.g., one whole tree).
- The interface a member function provides to the user probably doesn't leave it well suited for that function itself to be recursive.
- For example, there is no smaller
IntTree
instance to recurse on (internally, our tree is made ofNode
s).
- For example, there is no smaller
- The interface a member function provides to the user probably doesn't leave it well suited for that function itself to be recursive.
- Recursive functions necessarily have the data they're working on as a parameter, since each successive call needs to pass in a smaller variant of the problem to solve, until we hit the base case.
- The parameters a recursive function needs don't necessarily provide the right interface for the user.
- We can combine the two and address these issues by having the member function call a recursive helper function to operate on the internal data.
- If the helper function doesn't need to use any data members from the containing class instance, declare it
static
.
- If the helper function doesn't need to use any data members from the containing class instance, declare it
- When you write
insert
andexists
yourself, we expect you to use a recursive approach (using recursive helper functions).
- Member functions usually operate on a specific instance of a class (e.g., one whole tree).
- In-order traversal:
- Visits the nodes in the tree in sorted order
- Recursive algorithm: do-the-left-subtree, do-the-value, do-the-right-subtree.
- Static member functions:
- Don't operate on an instance of the class (i.e., no
this
object).
- Don't operate on an instance of the class (i.e., no
- Search
- Different data structure interfaces: maps vs sets
- Lookup and insert as key operations for these data structures
- Tree Terminology
- Terminology for tree data structures:
- Node, edge, parent, child
- Root, leaf
- Height
- Binary tree
- Terminology for tree data structures:
- BST Lookup
- Binary search trees
- Lookup procedure
- Applications of binary search trees
- Binary search trees
- BST Insert
- Insertion into a binary search tree
- Recursive insertion algorithm
- BST Delete
- Deletion from a binary search tree
- Time Complexity
- Time complexity analysis for binary search trees
- Best case vs worst case
- Overall complexity
- Time complexity analysis for binary search trees
- Encoding a BST in C++
- Encoding a binary search tree in C++
Node
struct- Overall tree class
- Encoding a binary search tree in C++
- C++ Implementation: insert()
- Implementing
insert
in C++- Helper functions
- Passing pointers by reference
- Implementing
- Static Member Functions
- Static member functions in C++
- Insert Alternatives
- Alternative implementations of
insert
- Alternative implementations of
- exists
- Implementing exists()
- Iterating Over Our BST
- In-order traversal of a binary search tree
Lesson 2 — 2–3–4 Trees and Red–Black Trees
Key Points
2–3–4 Trees
- A 2–3–4 Tree allows three kinds of nodes:
- 2-nodes hold one value and have two children.
- 3-nodes hold two values and have three children.
- 4-nodes hold three values and have four children.
- Insertion in a 2–3–4 tree always happens at a leaf, by adding the new value to an existing leaf node. That insertion will turn a 2-node into a 3-node or a 3-node into a 4-node.
- To make sure that there is always room to insert new values, any time we pass a 4-node on the way down the tree during an insertion, we promote the middle value up into the node above it.
- 2–3–4 trees require more work at each node than a BST, but because they stay perfectly balanced (all leaves the same distance from the root), they have worst-case lookup and insert performance of \( \Theta( \log n ) \).
Red–Black Trees
- A Red–Black tree is a binary representation of a 2–3–4 tree.
- To convert a 4-node to a Red–Black representation, create a black node from the middle value. Then add two children for the left and right values, both as red nodes.
- To convert a 3-node to a Red–Black representation, create a black node from one of the two values. Then add a child for the other value as a red node.
- Because Red–Black trees are isomorphic with 2–3–4 trees, they also have \( \Theta( \log n ) \) worst-case performance for insert and lookup.
Rotation
- A fundamental operation for rearranging nodes in a BST.
- Can be implemented with a constant number of pointer reassignments (i.e., constant time):
- A child of the root of the subtree moves up to become the new root.
- The old root becomes the new root's child.
- The subtree “between” the two nodes becomes the old root's child.
- Rotations preserve the BST property.
Like these University of Washington CSE grad students (“The CSE Band”) from the early 2000s, we hope you're inspired by the beauty of Red–Black trees!.
- Multiple Types of Nodes
- Introduction to 2-3-4 trees
- Insertion in 2–3–4 Trees
- Insertion into a 2-3-4 tree
- Promoting nodes during insertion
- Complexity of 2–3–4 Trees
- Complexity analysis of 2-3-4 trees
- Super Edges
- Converting 2-3-4 trees to binary trees
- Red–Black Trees
- Red-black trees
- Equivalence to 2-3-4 trees
- Node coloring
- Red-black trees
- Rearranging BSTs with Rotations
- Tree rotations
- Maintaining BST property
- Tree rotations
- Red–Black Tree Insert
- Insertion into red-black trees
- Rotations and recoloring
- Insertion into red-black trees
- Complexity and Applications of Red–Black Trees
- Complexity of red-black trees
- Applications:
std::set
,std::map
, Linux scheduling
Week 10
Lesson 1 — Amortized Analysis and Splay Trees
Key Points
-
Expected-Time Reminder:
- Expected time describes the performance of an algorithm with some amount of randomness. Time for individual operations will come from a probability distribution. In practice, probabilities may be a sufficient guarantee, but from a theoretical perspective worst-case behavior is possible even if it is unlikely.
-
An amortized bound promises that the total complexity of a sequence of operations is bounded by the sum of the amortized times of the individual operations.
- Only applies to the total complexity of the sequence of operations from the very beginning.
- It does not give a bound for a the time complexity of a single operation, time for individual operations can vary significantly.
- A useful fiction: even though some operations are expensive and some are cheap, an amortized bound spreads the cost evenly.
- Everyday analogy: it's like a professor keeping you an extra ten minutes past when class should have ended because the previous two classes ended five minutes early
-
Aggregate method:
- Consider an arbitrary sequence of operations.
- Analyze its complexity.
- Divide by the number of operations to get an amortized bound.
-
Array-backed list analysis:
- With capacity doubling, amortized \( \Theta(1) \)
push_back
. - With capacity halving, amortized \( \Theta(1) \)
pop_back
.
- With capacity doubling, amortized \( \Theta(1) \)
-
Splay Trees:
- Self-balancing tree with amortized \( \mathrm{O}( \log{n} ) \) insert/lookup, \( \mathrm{O}(n) \) worst-case.
- Really efficient when lookups have high locality (lookups tend to repeat).
- Every inserted/looked-up node is rotated to the root.
- Recently inserted/found nodes are easy to find (near best-case time).
- No space overhead!
- Inspiration for this song by Dani Demas, HMC '15! (Warning: banjo.)
- Before You Start
- Characterizing best case, worst case, and overall complexity of push_back() in an array-backed list
- \( \Theta \) vs. \( \mathrm{O} \) notation for characterizing complexity over a range of cases
- Splay Trees
- Overview of splay trees, including:
- Splay operation to bring recently accessed nodes to root
- Gains efficiency from temporal locality
- Visualization website
- Overview of splay trees, including:
- Splay-Tree Complexity 1
- Observing extremes of efficiency during inserts into splay trees
- Factoring long-term balance into complexity arguments
- Best, worst, and overall case analysis
- Variable Costs
- Complexity analysis of in-order traversal
- \( \Theta(n) \) overall but \( \Theta(1) \) to \( \Theta(n) \) per step
- Complexity analysis of in-order traversal
- Characterizing Operation Costs
- Limitations to worst case analysis
- Using "banker's method" analogy and operation coloring to argue complexity over sequences of operations
- IntVector push_back()
- Argument showing amortized \( \Theta(1) \) per push_back() when doubling capacity
- Discussion on rigorously deriving \( 2m - 3 \) moves formula
- Proving Amortized Bounds
- Three approaches to proving amortized bounds
- Aggregate method: look at total work for m operations, divide by m
- Banker's method: treat time like money
- Physicist's method: time like energy, with potential function
- Three approaches to proving amortized bounds
- Amortized Time vs. …
- Contrasting amortized, worst-case, and expected complexity
- Amortized analysis lets you pretend an operation met its amortized bound, doesn't bound a single operation
- Applicability depends on what user cares about
- Splay-Tree Complexity 2
- Applying amortized analysis to argue total complexity
- Comparing splay trees to other BST choices on dimensions like overhead and locality
(When logged in, completion status appears here.)