"On the Use of Stochastic Hessian Information in Optimization Methods for Machine Learning"
R. Byrd, G. Chin, W. Neveitt, J. Nocedal.
Advances in Nonlinear Programming, ed. Y. Yuan, pp. 153-175 Kluwer (1998)

This paper describes how to incorporate sampled curvature information in a Newton-
CG method and in a limited memory quasi-Newton method for statistical learning. The
motivation for this work stems from supervised machine learning applications involving a very large number of training points. We follow a batch approach, also known in the stochastic optimization literature as a sample average approximation (SAA) approach. Curvature information is incorporated in two sub-sampled Hessian algorithms, one based on a matrix-free inexact Newton iteration and one on a preconditioned limited memory BFGS iteration. A crucial feature of our technique is that Hessian-vector multiplications are carried out with a signi cantly smaller sample size than is used for the function and gradient. The eciency of the proposed methods is illustrated using a machine learning application involving speech recognition.
Download (pdf)