Time
2021-03-04 10:00, in 3 days
Venu
Online—ZOOM APP
ZOOM Info
Conference ID: 66233472769
Password: 554709
Abstract
Gradient Descent (GD) and its variations are optimization algorithms widely used in machine learning and other areas. This talk will describe two of our recent works on related problems. The first is on how to add momentum to gradient descent on a class of manifolds known as Lie groups. The second is on how deterministic gradient descent can escape local minimum with a large learning rate.
More precisely, the first part (joint work with Tomoki Ohsawa) generalizes a time-dependent variational principle proposed by Wibisono et al. from vector space to Lie group, and obtains continuous dynamics that are guaranteed to converge to a local minimum of any differentiable function on Lie group. These dynamics correspond to momentum versions of gradient flow on Lie groups. A particular case of SO(n) is then studied in details, with objective functions corresponding to leading Generalized EigenValue problems: the continuous dynamics are first made explicit in coordinates, and then discretized in structure-preserving fashions, resulting in optimization algorithms with faithful energy behavior (due to conformal symplecticity) and exactly remaining on the Lie group.
The second part (joint work with Lingkai Kong) reports that deterministic GD, which does not use any stochastic gradient approximation, can still exhibit stochastic behaviors. In particular, if the objective function is multiscale, then in a large learning rate regime, the deterministic GD dynamics can become chaotic and convergent not to a local minimizer but to a statistical distribution. In this sense, deterministic GD resembles stochastic GD even though no stochasticity is injected. A sufficient condition is also established for approximating this long-time statistical limit by a rescaled Gibbs distribution, which for example allows escapes from local minima to be quantified.
--
FROM 211.161.245.*