BIGDATA: Collaborative Research: F: Stochastic Approximation for Subspace and Multiview Representation Learning

Project: Research project

Description

Unsupervised learning of useful representations is one of the most basic challenges of machine
learning, and is particularly important in many large-data situations in which large, nearly
infinite, amounts of unlabeled data are available. Consequently, unsupervised subspace learning,
such as Principal Component Analysis (PCA), Partial Least Squares (PLS) and Canonical Correlation
Analysis (CCA) are ubiquitous tools used in many data analysis, machine learning and information
retrieval applications. In this proposal, we address these subspace methods, with an emphasis
on the multi-view methods (methods such as CCA that take advantage of expected correlations
between useful representations of different data varieties), as well as multi-view methods
that go beyond subspaces and seek deep models for correlated representations. We argue in
a bigdata setting, all these models should be studied as stochastic optimization problems,
with the goal of optimizing a population objective based on sample, rather then with a focus
on an empirical objective on finite data. This view suggests using Stochastic Approximation
approaches, such as Stochastic Gradient Descent (SGD) and Stochastic Mirror Descent, and enables
a rigorous analysis of their benefits over traditional finite-data methods. In this proposal
we develop SA approaches to PCA, PLS, CCA and related problems and extensions, including robust,
deep, and sparse variants, and analyze these problems in the data-laden regime.
Intellectual Merit :
In recent years, stochastic methods have been shown to be greatly beneficial, and in a sense
optimal, for many supervised learning problems, and are now often the methods of choice, especially
in large-scale learning. We carry over this development to the rich world of unsupervised
learning, and subspace learning in particular. Moving from typical supervised learning problems
to unsupervised subspace learning, bring with it interesting challenges,
such as coping with the apparent non-convex nature of PCA, with optimizing over sub-spaces
or manifolds rather than vectors. Furthermore, we are particularly interested in multi-view
subspace learning, as in CCA and PLS. Multi-view learning plays an important role in many
applications, and brings with it another layer of challenges. We show how many of the techniques
for PCA can be extended also for PLS. We also confront the unique difficulties of phrasing
CCA as a stochastic optimization problem, since unlike PCA, PLS and essentially all supervised
learning problems, the CCA objective cannot be expressed as an expectation over samples. We
also show that a population view of CCA allows for a full non-parametric representation of
non-linear CCA generalizations. We use this as a basis for developing non-linear deep-network
representation learning methods based on correlations between views rather than merely reconstruction
as in auto-encoders. In addition, this research aims to develop stochastic variants of the
more classical sparse subspace and representation learning problems, which builds a bridge
that can effectively ship the rich research results from the sparse learning community to
the Big Data regime.
Broader Impacts :
Representation learning methods, including subspace methods such as PCA, PLS and CCA, are
ubiquitous procedures in many scientific, engineering and data analysis applications. Especially
in this age of "big data", where better methods are required in order to handle increasingly
larger and more complex data sets, and when essentially infinite amount
StatusActive
Effective start/end date9/1/178/31/20

Funding

  • National Science Foundation (IIS -1840866)

Fingerprint

Principal component analysis
Unsupervised learning
Supervised learning
Learning systems
Mirrors
Big data