In modern machine learning applications (including neural networks), the parameters of a model are stored as a collection of tensor variables. This tensor structure arises due to inherent structure in the model and are exploited for computational efficiency. For example, within common frameworks such as Ten-sorflow and PyTorch, the parameters of a fully-connected layer are represented as a matrix and the parameters of a convolutional layer are represented as a 4-dimensional tensor. Current optimization methodologies fail to exploit this structure, rather opting to apply �first-order or diagonal scaling methods (SG, Adagrad, Adam) or treat the variables as a attened vector (L-BFGS). This, however, removes potential opportunities to reduce the complexity of the preconditioner, such as by employing separate scalings or preconditioners on each dimension. Recent work proposes to exploit the tensor structure by producing Kronecker-factorized preconditioners. We propose an alternative method based on quasi-Newton information that can exploit the tensor structure within optimization problems.We propose to develop optimization algorithms that retain the tensor structure of the gradient but employ separate scalings and/or preconditioning matrices for each dimension.
|Effective start/end date||12/15/20 → 12/14/23|
- University of Texas at Austin (UTA20-001225 //FA9550-21-1-0084)
- Air Force Office of Scientific Research (UTA20-001225 //FA9550-21-1-0084)
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.