CS 152 / Spring, 2021

Learning Objectives

Greek Symbols (reference material)
- For each greek letter: (all lowercase plus upper-case gamma, delta, theta, xi, pi, sigma, upsilon, phi, psi, omega):
  - I can provide the name (and case) when given the symbol
  - I can provide the symbol given the name (and case)
NumPy (reference material)
- I can convert a Python array to a NumPy array
- I can use broadcasting to operate on a NumPy array with a NumPy array of fewer dimensions.
- I can do pointwise addition or multiplication of NumPy arrays
- I understand NumPy array shapes:
  - I can change the shape of an array
  - I can explain the difference between an array of shape (5,), an array of shape (5, 1), and an array of shape (1, 5)
- I can stack two NumPy arrays
- I can use indexing and slicing to extract parts of an array
- I can create a NumPy array with random elements
- I can create a NumPy array with specified datatype.
- I can use NumPy for matrix multiplication.
Gradient Descent (Day 2, Day 3; fast.ai: 149-163)
- I Can explain each of the pieces of the gradient descent loop:
  - Theta
  - x
  - y
  - f
  - y-hat
  - Loss function
  - Optimizer
- I Can label a gradient descent loop diagram with each of the pieces
- I can run one iteration of the gradient descent algorithm by hand (given f(x), and the gradient)
- I can explain the difference between full-batch gradient descent, stochastic gradient descent and minibatch gradient descent and can explain the pros and cons of each
General ML (Day 2, Day 8; Aggarwal: 1.4.1; fast.ai: 28-30)
- I can identify why overfitting occurs, how it can be identifed, and the ways in which it can be fixed
- I can identify why underfitting occurs, how it can be identified, and the ways in which it can be fixed
- I can explain the use of training, validation and test datasets
Optimizers (Day 8; Aggarwal 3.5.1-3.5.3; fast.ai: 473-480)
- I can explain the use of and give code for the following optimizers
  - Plain SGD
- With weight decay
  - With momentum
  - With Nesterov Momentum
  - Adam
  - Adamw
  - Adagrad
  - RMSProp
- Lookahead
- I can explain which optimizers use learning rates and how learning rates are chosen
Loss functions (fast.ai: 194-203, 226-237)
- I can provide code for the following loss functions and describe when each would be used:
  - Cross Entropy
  - Mean Squared Error (MSE)
  - Binary Cross Entropy
  - Negative Log Likelihood (NLL)
  - L1 Error
Activation functions (Aggarwal 1.2.1.3)
- I can provide the equation for each of the following activation functions, along with the equation for the derivative, and can identify which should be used in a given situation:
  - ReLU
  - Leaky ReLU
  - Tanh
  - Softmax
  - LogSoftmax
  - Sigmoid
Neural Networks (Day 10, 11; Aggarwal 1.2.1.3-1.2.3, 1,3, 1.4.2, 3.2, 3.4)
- Given a Neural Network and its input, I can calculate the output
- Given a Neural Network and its input, I can calculate the partial derivative of the loss with respect to any given parameter
- I can show how exploding gradients can occur and methods to address them
- I can show how vanishing gradients can occur and methods to address them
Transfer Learning (Day 16; Aggarwal 8.4.7; fast.ai: fast.ai: 30-33, 207-212)
- Given a pretrained model, I can explain how to repurpose it for a new task, by removing the old head and adding a new head
- I can describe the process of finetuning
Regularization (Day 6, 15; Aggarwal 1.4.1.1, 3.6, 4.4, 4.5.1.2, 4.5.4-4.5.5, 4.6)
- I can identify the use of and implement standard regularization techniques:
  - Smaller batchsizes
  - Batch normalization
  - Dropout
  - Weight Decay
  - Data Augmentation
  - Early Stopping
- Multi-task learning
- Ensembles
  - Larger learning rate
  - Label smoothing
  - Mixup
CNNs (Day 14-17; Aggarwal 3.5.5, 8.1-8.2.6, 8.4; fast.ai: Chapters 13-14)
- I can explain how residual networks work and explain how they can address the vanishing gradient problem
- I can compute the output of a convolutional kernel
- I can use stride and padding to control the output size of a convolutional layer
- I can compute the number of weights in a convolutional layer
- I can compute the output of a max or average or adaptive pooling layer
- I can describe the architectures of ResNet and Inception
RNNS (Day 21-25; Aggarwal 7.2.1-7.2.4, 7.5-7.6; fast.ai: Chapter 12)
- I can explain how RNNs work
- I can compute output from an LSTM or GRU
- I can give the equations for an LSTM or GRU
- TBD
Transformers (Days 26-27)
- TBD