RI: Medium: Collaborative Research: Next-Generation Statistical Optimization Methods for Big Data Computing

Project: Research project

Description

Overview:
Page A
The objective of this proposal is to develop a new generation of optimization methods to address
data mining and discovery science challenges in large-scale scientific data analysis.
To attack several crucial bottlenecks of traditional optimization for modern big data applications,
this project will develop (i) a new research area named statistical optimization, which incorporates
sophisticated statistical thinking into modern optimization, and will effectively bridges
researchers in machine learning, statistics, optimization, and stochastic analysis; (ii) new
theoretical framework and computational methods for nonconvex and infinite-dimensional optimization,
which will motivate effective optimization methods with theoretical guarantees that are applicable
to a wide variety of prominent statistical models; (iii) new scalable optimization methods,
which aim at fully harnessing the horsepower of modern large-scale distributed computing infrastructure.
Key words: statistics, optimization, big data, distributed computing.
Intellectual Merit :
This proposal is built upon earlier works of the two PIs, which employed statistical thinking
to address modern optimization problems in machine learning. In recent years, optimization
has become more and more important in machine learning due to the availability of large scale
datasets. The proposed research attempts to extend traditional theory to open up new possibilities
for nontraditional optimization problems, such as nonconvex and infinite-dimensional optimization
problems. Moreover, the proposed research not only tries to develop deeper theoretical understanding
of several challenging issues in optimization, such as nonconvexity, but also promises new
algorithmic developments that can lead to better practical methods in the big data era. Therefore
the proposed research can potentially lead to breakthroughs not only in theory, but also in
practical applications.
Broader Impacts :
In the Big Data era, we see an urgent need for powerful optimization methods to handle the
increasing complexity of modern datasets. However, we still lack adequate methodology, theory,
and computational techniques. By simultaneously addressing all these aspects, this project
is expected to deliver useful statistical optimization methods that benefit all relevant scientific
areas. The deliverables of this project include easy-to-use software. Such software directly
helps scientists to explore and analyze complex datasets. In particular, one of the PIs (Liu)
is collaborating with biologists and neuroscientists from Johns Hopkins University School
of Medicine and Harvard University School of Public Health. These close collaborations will
ensure the more direct impact of this project to the targeted scientific communities. Moreover,
the PIs will offer classes to teach modern techniques in handling big data optimization problems.
We also plan to write tutorial papers and disseminate the results of this research through
the internet, academic conferences, workshops, and journals. We will hold tutorials and workshops
at conferences to educate students and the machine learning community about this new research
topic.
StatusActive
Effective start/end date9/1/1712/31/19

Funding

  • National Science Foundation (IIS-1840857)

Fingerprint

Learning systems
Big data
Distributed computer systems
Statistics
Public health
Computational methods
Medicine
Availability
Internet
Students