Gradient Descent
- Gradient Descent is an algorithm that can use to try to minimize any function.
Gradient Descent outline
- Start with some w,b (attributes)
- Keep changing w,b to reduce J(w, b)
- Until we settle at or near a minimum
Gradient Descent algorithm
$$ w = w - \alpha \frac{\partial{}}{\partial{w}}J(w, b) $$
$$ b = b - \alpha \frac{\partial{}}{\partial{b}}J(w, b) $$


- \alpha means the learning rate, which is between 0 to 1.
- Derivative term
- Correct: Simultaneously update w and b
The principle of Gradient Descent algorithm

How to choose Alpha (Learning rate)
- If the learning rate is too small then gradient descent will work, but it will be slow.
- By contrast, if the learning rate is too large then gradient descent may overshoot and never reach minimum.
Running gradient descent
“Batch” Gradient descent
- Batch: Each step of gredient descent uses all of the training examples
Feature scaling