CAREER: An Integrated Inferential Framework for Big Data Research and Education

Project: Research project

Project Details


1 Overview This proposal aims to develop novel inferential methods for assessing uncertainty (e.g., constructing confidence intervals or testing hypotheses) of modern statistical procedures unique to Big Data analysis. It will develop innovative inferential tools for a variety of machine learning methods which have not yet been equipped with inferential power. It will also train the next-generation of academic leaders with the inferential skills needed to be competitive in the modern sciences. This proposal is constructed in the context of a new generation of statistical methods developed to handle the increasing complexities of modern data. However, most of these methods give only point estimates of parameters, while typically practitioners require more sophisticated inferential statements to assess uncertainty. For instance, in genomics, the p-value of a significance test of a biomarker is scientifically more informative than simply reporting whether this marker is selected or not. Therefore, a substantial gap exists between the newly developed methods and their scientific applications. Classical inferential theory has lagged behind the rapid development of these new methods due to several unique challenges of Big Data. Firstly, the challenge of high dimensional data, which motivates the development of estimators for simultaneous model selection and parameter estimation. Most classical inferential methods do not take model selection uncertainty into consideration. Secondly, the challenge of massive data, which motivates the development of heterogeneous modeling and divide-and-conquer estimators. In contrast, classical inferential theory generally assumes the data are homogeneous and stored in a central database. Thirdly, the challenge of complex data (e.g., heavy-tailed and missing data), which motivates the development of highly robust estimators. The inferential theory for these estimators is much less developed. The proposed research puts forward new methods for inferential analysis that handle the above challenges in a general abstract fashion. The theory and methods developed in this career development plan will serve as a foundation for modern Big Data research and education. 2 Intellectual Merit This proposal addresses several fundamental challenges in modern inferential analysis and will lead to the creation of a new research area named Big Data Inference. Current literature of Big Data research mainly focuses on developing new estimators for complex data. However, most of these estimators are still in lack of systematic inferential methods for uncertainty assessment. The proposed research bridges this gap by developing a new generation of inferential methods for modern estimators unique to Big Data analysis. In addition, this proposal will push the frontiers of modern statistical science by developing new technical tools ranging from nonasymptotic concentration inequalities to asymptotic limiting theorems for many complex estimators. Current literature of Big Data education focuses more on teaching ‘formal’ statistical inference which consists of estimating population parameters with confidence intervals and testing conjectures about parameters with hypothesis tests. The novelty of the proposed education plan lies in its introduction of the practice of ‘informal’ inferential reasoning to complement the formal one. Such a hybrid approach allows an easier integration of research and education under a unified framework. 3 Broader Impact This career proposal will push the integration of Stat
Effective start/end date9/1/176/30/23


  • National Science Foundation (DMS‐1841569-001)


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.