Symmetry, Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization

Xingguo Li, Junwei Lu, Raman Arora, Jarvis Haupt, Han Liu, Zhaoran Wang, Tuo Zhao

Research output: Contribution to journalArticle

Abstract

We propose a general theory for studying the landscape of nonconvex optimization with underlying symmetric structures for a class of machine learning problems (e.g., low-rank matrix factorization, phase retrieval, and deep linear neural networks). In particular, we characterize the locations of stationary points and the null space of Hessian matrices of the objective function via the lens of invariant groups. As a major motivating example, we apply the proposed general theory to characterize the global landscape of the nonconvex optimization in low-rank matrix factorization problem. We illustrate how the rotational symmetry group gives rise to infinitely many nonisolated strict saddle points and equivalent global minima of the objective function. By explicitly identifying all stationary points, we divide the entire parameter space into three regions: (R1) the region containing the neighborhoods of all strict saddle points where the objective has negative curvature; (R2) the region containing neighborhoods of all global minima, where the objective enjoys strong convexity along certain directions; and (R3) the complement of the above regions, where the gradient has sufficiently large magnitude. We further extend our result to the matrix sensing problem. Such global landscape implies that strong global convergence guarantees for popular iterative algorithms with arbitrary initial solutions.

Original languageEnglish (US)
Article number8675509
Pages (from-to)3489-3514
Number of pages26
JournalIEEE Transactions on Information Theory
Volume65
Issue number6
DOIs
StatePublished - Jun 1 2019

Fingerprint

Global optimization
Factorization
neural network
Learning systems
guarantee
Lenses
Group
Neural networks
learning

Keywords

  • Strict saddle problem
  • global landscape
  • invariant group
  • matrix sensing
  • nonconvex matrix factorization

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Library and Information Sciences

Cite this

Li, Xingguo ; Lu, Junwei ; Arora, Raman ; Haupt, Jarvis ; Liu, Han ; Wang, Zhaoran ; Zhao, Tuo. / Symmetry, Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization. In: IEEE Transactions on Information Theory. 2019 ; Vol. 65, No. 6. pp. 3489-3514.
@article{8aad12f964464f52b6234e0a115a69ce,
title = "Symmetry, Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization",
abstract = "We propose a general theory for studying the landscape of nonconvex optimization with underlying symmetric structures for a class of machine learning problems (e.g., low-rank matrix factorization, phase retrieval, and deep linear neural networks). In particular, we characterize the locations of stationary points and the null space of Hessian matrices of the objective function via the lens of invariant groups. As a major motivating example, we apply the proposed general theory to characterize the global landscape of the nonconvex optimization in low-rank matrix factorization problem. We illustrate how the rotational symmetry group gives rise to infinitely many nonisolated strict saddle points and equivalent global minima of the objective function. By explicitly identifying all stationary points, we divide the entire parameter space into three regions: (R1) the region containing the neighborhoods of all strict saddle points where the objective has negative curvature; (R2) the region containing neighborhoods of all global minima, where the objective enjoys strong convexity along certain directions; and (R3) the complement of the above regions, where the gradient has sufficiently large magnitude. We further extend our result to the matrix sensing problem. Such global landscape implies that strong global convergence guarantees for popular iterative algorithms with arbitrary initial solutions.",
keywords = "Strict saddle problem, global landscape, invariant group, matrix sensing, nonconvex matrix factorization",
author = "Xingguo Li and Junwei Lu and Raman Arora and Jarvis Haupt and Han Liu and Zhaoran Wang and Tuo Zhao",
year = "2019",
month = "6",
day = "1",
doi = "10.1109/TIT.2019.2898663",
language = "English (US)",
volume = "65",
pages = "3489--3514",
journal = "IEEE Transactions on Information Theory",
issn = "0018-9448",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "6",

}

Symmetry, Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization. / Li, Xingguo; Lu, Junwei; Arora, Raman; Haupt, Jarvis; Liu, Han; Wang, Zhaoran; Zhao, Tuo.

In: IEEE Transactions on Information Theory, Vol. 65, No. 6, 8675509, 01.06.2019, p. 3489-3514.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Symmetry, Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization

AU - Li, Xingguo

AU - Lu, Junwei

AU - Arora, Raman

AU - Haupt, Jarvis

AU - Liu, Han

AU - Wang, Zhaoran

AU - Zhao, Tuo

PY - 2019/6/1

Y1 - 2019/6/1

N2 - We propose a general theory for studying the landscape of nonconvex optimization with underlying symmetric structures for a class of machine learning problems (e.g., low-rank matrix factorization, phase retrieval, and deep linear neural networks). In particular, we characterize the locations of stationary points and the null space of Hessian matrices of the objective function via the lens of invariant groups. As a major motivating example, we apply the proposed general theory to characterize the global landscape of the nonconvex optimization in low-rank matrix factorization problem. We illustrate how the rotational symmetry group gives rise to infinitely many nonisolated strict saddle points and equivalent global minima of the objective function. By explicitly identifying all stationary points, we divide the entire parameter space into three regions: (R1) the region containing the neighborhoods of all strict saddle points where the objective has negative curvature; (R2) the region containing neighborhoods of all global minima, where the objective enjoys strong convexity along certain directions; and (R3) the complement of the above regions, where the gradient has sufficiently large magnitude. We further extend our result to the matrix sensing problem. Such global landscape implies that strong global convergence guarantees for popular iterative algorithms with arbitrary initial solutions.

AB - We propose a general theory for studying the landscape of nonconvex optimization with underlying symmetric structures for a class of machine learning problems (e.g., low-rank matrix factorization, phase retrieval, and deep linear neural networks). In particular, we characterize the locations of stationary points and the null space of Hessian matrices of the objective function via the lens of invariant groups. As a major motivating example, we apply the proposed general theory to characterize the global landscape of the nonconvex optimization in low-rank matrix factorization problem. We illustrate how the rotational symmetry group gives rise to infinitely many nonisolated strict saddle points and equivalent global minima of the objective function. By explicitly identifying all stationary points, we divide the entire parameter space into three regions: (R1) the region containing the neighborhoods of all strict saddle points where the objective has negative curvature; (R2) the region containing neighborhoods of all global minima, where the objective enjoys strong convexity along certain directions; and (R3) the complement of the above regions, where the gradient has sufficiently large magnitude. We further extend our result to the matrix sensing problem. Such global landscape implies that strong global convergence guarantees for popular iterative algorithms with arbitrary initial solutions.

KW - Strict saddle problem

KW - global landscape

KW - invariant group

KW - matrix sensing

KW - nonconvex matrix factorization

UR - http://www.scopus.com/inward/record.url?scp=85065984137&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85065984137&partnerID=8YFLogxK

U2 - 10.1109/TIT.2019.2898663

DO - 10.1109/TIT.2019.2898663

M3 - Article

VL - 65

SP - 3489

EP - 3514

JO - IEEE Transactions on Information Theory

JF - IEEE Transactions on Information Theory

SN - 0018-9448

IS - 6

M1 - 8675509

ER -