Abstract


"On the Use of Stochastic Hessian Information in Optimization Methods for Machine Learning"
R. Byrd, G. Chin, W. Neveitt, J. Nocedal.
Advances in Nonlinear Programming, ed. Y. Yuan, pp. 153-175 Kluwer (1998)

This paper describes how to incorporate sampled curvature information in a Newton-
CG method and in a limited memory quasi-Newton method for statistical learning. The
motivation for this work stems from supervised machine learning applications involving a very large number of training points. We follow a batch approach, also known in the stochastic optimization literature as a sample average approximation (SAA) approach. Curvature information is incorporated in two sub-sampled Hessian algorithms, one based on a matrix-free inexact Newton iteration and one on a preconditioned limited memory BFGS iteration. A crucial feature of our technique is that Hessian-vector multiplications are carried out with a signi cantly smaller sample size than is used for the function and gradient. The eciency of the proposed methods is illustrated using a machine learning application involving speech recognition.
Download (pdf)