Abstract
We introduce a variant of (sparse) PCA in which the set of feasible support sets is determined by a graph. In particular, we consider the following setting: given a directed acyclic graph G on p vertices corresponding to variables, the non-zero entries of the extracted principal component must coincide with vertices lying along a path in G. From a statistical perspective, information on the underlying network may potentially reduce the number of observations required to recover the population principal component. We consider the canonical estimator which optimally exploits the prior knowledge by solving a non-convex quadratic maximization on the empirical covari-ance. We introduce a simple network and analyze the estimator under the spiked covariance model. We show that side information potentially improves the statistical complexity. We propose two algorithms to approximate the solution of the constrained quadratic maximization, and recover a component with the desired properties. We empirically evaluate our schemes on synthetic and real datasets.
Original language | English (US) |
---|---|
Title of host publication | 32nd International Conference on Machine Learning, ICML 2015 |
Editors | Francis Bach, David Blei |
Publisher | International Machine Learning Society (IMLS) |
Pages | 1728-1736 |
Number of pages | 9 |
ISBN (Electronic) | 9781510810587 |
State | Published - 2015 |
Event | 32nd International Conference on Machine Learning, ICML 2015 - Lile, France Duration: Jul 6 2015 → Jul 11 2015 |
Publication series
Name | 32nd International Conference on Machine Learning, ICML 2015 |
---|---|
Volume | 3 |
Conference
Conference | 32nd International Conference on Machine Learning, ICML 2015 |
---|---|
Country/Territory | France |
City | Lile |
Period | 7/6/15 → 7/11/15 |
Funding
The authors would like to acknowledge support from grants: NSF CCF 1422549, 1344364, 1344179 and an ARO YIP award.
ASJC Scopus subject areas
- Human-Computer Interaction
- Computer Science Applications