Lecture 14.2: deflected gradient methods II - Heavy Ball
Deflection-type methods II, heavy ball gradient, "poorman's" deflection (but no line search). A hint at the (overly complex) convergence theory of the heavy ball gradient, what the results are: faster than gradient with the right alpha and beta (and it typically is), nonmonotone at the beginning but close so at the end (and it typically is). MATLAB implementation of the heavy ball gradient. The lucky quadratic (even if badly conditioned) case when the optimal alpha and beta are known, and they really work. General nonlinear functions, impact of algorithmic parameters: alpha is the usual mess, but beta is also important. The lucky (quadratic case) when the optimal alpha and beta are known, and they really work. Take away: better than gradient with the right alpha and beta, but not really fast. A quick hint at accelerated gradients, why they may matter (or not). I love heavy ball / ACCG because it's an arrow shot between Newton's and the subgradient, an arrow shot across the abyss (of nondifferentiability).