Batch mode activc learning with hierarchical-structured embedded variance

Yu Cheng, Zhengzhang Chen, Hongliang Fei, Fei Wang, Alok Choudhary

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations


We consider the problem of active learning when the categories are represented as a tree with leaf nodes as outputs and internal nodes as clusters of the outputs at multiple granularity. Recent work has improved the traditional techniques by moving beyond "flat" structure through incorporation of the label hierarchy into the uncertainty measure. However, these methods have two major limitations when used. First, these methods roughly use the information in the label structure but do not take into account the training samples, which may lead to a sampling bias due to their crude approximation of the class relations. Second, none of these methods can work in a batch mode to reduce the computational time of training. We propose a batch mode active learning scheme that exploits both the hierarchical structure of the labels and the characteristics of the training data to select the most informative data for human labeling. We achieve this goal by first using an approach based on graph embedding that embeds the relationships between the labels and data points in a transformed low-dimensional space. Then, we compute uncertainty by calculating the variance among the points and the labels in the embedding space. Finally, the selection criterion is designed to construct batches and incorporate a diversity measure. Experimental results indicate that our technique achieves a notable improvement in performance over the state-of-the-art approaches.

Original languageEnglish (US)
Title of host publicationSIAM International Conference on Data Mining 2014, SDM 2014
EditorsMohammed J. Zaki, Arindam Banerjee, Srinivasan Parthasarathy, Pang Ning-Tan, Zoran Obradovic, Chandrika Kamath
PublisherSociety for Industrial and Applied Mathematics Publications
Number of pages9
ISBN (Electronic)9781510811515
StatePublished - 2014
Event14th SIAM International Conference on Data Mining, SDM 2014 - Philadelphia, United States
Duration: Apr 24 2014Apr 26 2014

Publication series

NameSIAM International Conference on Data Mining 2014, SDM 2014


Other14th SIAM International Conference on Data Mining, SDM 2014
CountryUnited States

ASJC Scopus subject areas

  • Computer Science Applications
  • Software

Fingerprint Dive into the research topics of 'Batch mode activc learning with hierarchical-structured embedded variance'. Together they form a unique fingerprint.

Cite this