
Description In this lecture, Professor Strang explains both momentum-based gradient descent and Nesterov’s accelerated gradient descent. Summary Study the zig-zag example: Minimize \(F = \frac{1}{2} (x^2 + by^2)\) Add a momentum term / heavy ball remembers its directions. New point \(k\) + 1 comes from TWO old points \(k\) and \(k\) - 1. “1st order” becomes “2nd order” or “1st order system” as in ODEs. Convergence rate improves: 1 - \(b\) to 1 - square root of \(b\) ! Related section in textbook: VI.4 Instructor: Prof. Gilbert Strang