RI: Medium: Collaborative Research: Next-Generation Statistical Optimization Methods for Big Data Computing

Project: Research project

Project Details


Overview: Page A The objective of this proposal is to develop a new generation of optimization methods to address data mining and discovery science challenges in large-scale scientific data analysis. To attack several crucial bottlenecks of traditional optimization for modern big data applications, this project will develop (i) a new research area named statistical optimization, which incorporates sophisticated statistical thinking into modern optimization, and will effectively bridges researchers in machine learning, statistics, optimization, and stochastic analysis; (ii) new theoretical framework and computational methods for nonconvex and infinite-dimensional optimization, which will motivate effective optimization methods with theoretical guarantees that are applicable to a wide variety of prominent statistical models; (iii) new scalable optimization methods, which aim at fully harnessing the horsepower of modern large-scale distributed computing infrastructure. Key words: statistics, optimization, big data, distributed computing. Intellectual Merit : This proposal is built upon earlier works of the two PIs, which employed statistical thinking to address modern optimization problems in machine learning. In recent years, optimization has become more and more important in machine learning due to the availability of large scale datasets. The proposed research attempts to extend traditional theory to open up new possibilities for nontraditional optimization problems, such as nonconvex and infinite-dimensional optimization problems. Moreover, the proposed research not only tries to develop deeper theoretical understanding of several challenging issues in optimization, such as nonconvexity, but also promises new algorithmic developments that can lead to better practical methods in the big data era. Therefore the proposed research can potentially lead to breakthroughs not only in theory, but also in practical applications. Broader Impacts : In the Big Data era, we see an urgent need for powerful optimization methods to handle the increasing complexity of modern datasets. However, we still lack adequate methodology, theory, and computational techniques. By simultaneously addressing all these aspects, this project is expected to deliver useful statistical optimization methods that benefit all relevant scientific areas. The deliverables of this project include easy-to-use software. Such software directly helps scientists to explore and analyze complex datasets. In particular, one of the PIs (Liu) is collaborating with biologists and neuroscientists from Johns Hopkins University School of Medicine and Harvard University School of Public Health. These close collaborations will ensure the more direct impact of this project to the targeted scientific communities. Moreover, the PIs will offer classes to teach modern techniques in handling big data optimization problems. We also plan to write tutorial papers and disseminate the results of this research through the internet, academic conferences, workshops, and journals. We will hold tutorials and workshops at conferences to educate students and the machine learning community about this new research topic.
Effective start/end date9/1/1712/31/20


  • National Science Foundation (IIS-1840857)


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.