Combinatorial Statistics and Brain Science

Project: Research project

Project Details


My research focuses on developing a new field named Combinatorial Statistics, along with its application in Brain Science. Combinatorial Statistics studies both sampling complexity (amount of data) and computational complexity (running time) for inferring from high dimensional distributions parameterized by discrete structures such as graphs, partitions, permutations or multicomplexes. Such distributions play an important role in a number of important scientific domains including computational neuroscience, genomics, social networks, coding theory and molecular and evolutionary biology. Success on this research will lead to a new mathematical theory of finding complex structures from data, which integrates ideas from computation, combinatorics, statistics and high dimensional probability in a unified paradigm. To achieve this goal, I devoted my early career in establishing a new combinatorial inference theory for graphical models. Graphical models parameterize high dimensional distributions by graphs, which encode complex conditional independence relationships among many random variables. Combinatorial inference aims to test or assess uncertainty of some global structural properties of the underlying graph based on data (e.g., Whether the graph is disconnected? What is the maximum degree of the graph? Whether the graph is triangle-free?). It holds a lot of promise for modern scientific data analysis since such global structural properties carry important scientific meanings (e.g., in studying brain networks, the maximum degree of a node reflects the activity level of the corresponding brain region). However, until recently, no systematic combinatorial inference theory exists in the literature. To bridge this gap, my research addresses two basic questions: (i) What are the fundamental limits of a combinatorial inference algorithm under a computational budge? (ii). Does an understanding of these limits direct us towards the construction of better combinatorial inference methods? My first contribution is to establish a systematic framework which characterizes the information-theoretic lower bound of a combinatorial inference problem under a computational budget. Unlike the classical complexity theory which mainly exploits the Turing machine to quantify computation, my approach quantifies computation by the number of interactions an algorithm accesses the data through a computational oracle (e.g., a gradient calculation or stochastic sampling step), creating an oracle complexity. Compared to the Turing machine approach, this new framework is more systematic (not case-by-case), general (analyzes a wide class of problems) and rigorous (does not rely on unproven hardness conjectures). It also revealed several fundamental computational barriers for many combinatorial inference problems (e.g., detecting a clique in the graph): that the computationally tractable algorithms cannot achieve the minimal statistical error rates. My effort on this direction has led to a new research area named ‘oracle computational lower bounds’. To match the obtained lower bounds, my second contribution is to prove that all the combinatorial inference problems can be reduced into a multiple testing problem where the number of hypotheses is usually larger than the sample size. In this setting, the most widely used statistical tool, the Central Limit Theorem, is no longer suitable. To make progress, we need a fundamentally new approach that leverages the combinatorial structure of the problem. For this, I have developed a unified methodol
Effective start/end date9/15/179/14/22


  • Alfred P. Sloan Foundation (FG-2017-9862)


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.