Combinatorial Statistics and Brain Science

Project: Research project

Description

My research focuses on developing a new field named Combinatorial Statistics, along with its application in Brain
Science. Combinatorial Statistics studies both sampling complexity (amount of data) and computational complexity
(running time) for inferring from high dimensional distributions parameterized by discrete structures such as graphs,
partitions, permutations or multicomplexes. Such distributions play an important role in a number of important scientific
domains including computational neuroscience, genomics, social networks, coding theory and molecular and evolutionary
biology. Success on this research will lead to a new mathematical theory of finding complex structures from data, which
integrates ideas from computation, combinatorics, statistics and high dimensional probability in a unified paradigm.
To achieve this goal, I devoted my early career in establishing a new combinatorial inference theory for graphical
models. Graphical models parameterize high dimensional distributions by graphs, which encode complex conditional
independence relationships among many random variables. Combinatorial inference aims to test or assess uncertainty of
some global structural properties of the underlying graph based on data (e.g., Whether the graph is disconnected? What
is the maximum degree of the graph? Whether the graph is triangle-free?). It holds a lot of promise for modern scientific
data analysis since such global structural properties carry important scientific meanings (e.g., in studying brain networks,
the maximum degree of a node reflects the activity level of the corresponding brain region). However, until recently, no
systematic combinatorial inference theory exists in the literature. To bridge this gap, my research addresses two basic
questions: (i) What are the fundamental limits of a combinatorial inference algorithm under a computational budge? (ii).
Does an understanding of these limits direct us towards the construction of better combinatorial inference methods?
My first contribution is to establish a systematic framework which characterizes the information-theoretic lower bound
of a combinatorial inference problem under a computational budget. Unlike the classical complexity theory which mainly
exploits the Turing machine to quantify computation, my approach quantifies computation by the number of interactions
an algorithm accesses the data through a computational oracle (e.g., a gradient calculation or stochastic sampling step),
creating an oracle complexity. Compared to the Turing machine approach, this new framework is more systematic (not
case-by-case), general (analyzes a wide class of problems) and rigorous (does not rely on unproven hardness conjectures).
It also revealed several fundamental computational barriers for many combinatorial inference problems (e.g., detecting a
clique in the graph): that the computationally tractable algorithms cannot achieve the minimal statistical error rates. My
effort on this direction has led to a new research area named ‘oracle computational lower bounds’.
To match the obtained lower bounds, my second contribution is to prove that all the combinatorial inference problems
can be reduced into a multiple testing problem where the number of hypotheses is usually larger than the sample size. In
this setting, the most widely used statistical tool, the Central Limit Theorem, is no longer suitable. To make progress, we
need a fundamentally new approach that leverages the combinatorial structure of the problem. For this, I have developed
a unified methodol
StatusActive
Effective start/end date9/15/179/14/21

Funding

  • Alfred P. Sloan Foundation (FG-2017-9862)

Fingerprint

Statistics
Graph in graph theory
High-dimensional
Turing Machine
Maximum Degree
Structural Properties
Quantify
Brain
Computational Neuroscience
Graph Partition
Lower bound
Triangle-free
Direct Limit
Data Complexity
Multiple Testing
Complexity Theory
Parameterise
Coding Theory
Network Coding
Graphical Models