Supplemental Resources
- Gradient descent, how neural networks learn | Chapter 2, Deep learning
- 3Blue1Brown video covering the basics of gradient descent in deep learning out of his Deep Learning series. Of course, I strongly recommend any of his excellent videos.
- Stochastic Gradient Descent, Clearly Explained!!!
- A fun StatQuest video about stochastic gradient descent.
- Adam Optimizer Explained in Detail | Deep Learning - Coding Lane
- A good simplified breakdown of ADAM optimizaiton
- Goh, "Why Momentum Really Works", Distill, 2017. http://doi.org/10.23915/distill.00006
- A really nice (deep dive) post about Momentum, a stochastic variance reduction technique
- Lipschitz Functions: Intro and Simple Explanation for Usefulness in Machine Learning
- Learn about Lipschitz continuous functions and their uses in machine learning
- L20.4 On the Mean Squared Error of an Estimator
- MIT Open Courseware video about MSE
- Notes on "Big O" notation
- "Big O" is standard notation to describe convergence for machine learning algorithms