Learning hierarchically decomposable concepts with active over-labeling

Yuji Mo, Stephen D. Scott, Douglas C Downey

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

Many classification tasks target high-level concepts that can be decomposed into a hierarchy of finer-grained sub-concepts. For example, some string entities that are Locations are also Attractions, some Attractions are Museums, etc. Such hierarchies are common in named entity recognition (NER), document classification, and biological sequence analysis. We present a new approach for learning hierarchically decomposable concepts. The approach learns a high-level classifier (e.g., location vs. non-location) by seperately learning multiple finer-grained classifiers (e.g., museum vs. non-museum), and then combining the results. Soliciting labels at a finer level of granularity than that of the target concept is a new approach to active learning, which we term active over-labeling. In experiments in NER and document classification tasks, we show that active over-labeling substantially improves area under the precision-recall curve when compared with standard passive or active learning. Finally, because finer-grained labels may be more expensive to obtain, we also present a cost-sensitive active learner that uses a multi-Armed bandit approach to dynamically choose the label granularity to target, and show that the bandit-based learner is robust to differences in label cost and labeling budget.

Original languageEnglish (US)
Title of host publicationProceedings - 16th IEEE International Conference on Data Mining, ICDM 2016
EditorsFrancesco Bonchi, Xindong Wu, Ricardo Baeza-Yates, Josep Domingo-Ferrer, Zhi-Hua Zhou
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages340-349
Number of pages10
ISBN (Electronic)9781509054725
DOIs
StatePublished - Jan 31 2017
Event16th IEEE International Conference on Data Mining, ICDM 2016 - Barcelona, Catalonia, Spain
Duration: Dec 12 2016Dec 15 2016

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786

Other

Other16th IEEE International Conference on Data Mining, ICDM 2016
CountrySpain
CityBarcelona, Catalonia
Period12/12/1612/15/16

Keywords

  • Active learning
  • Cost analysis
  • Hierarchical labeling
  • Semi-supervised learning
  • Text classification

ASJC Scopus subject areas

  • Engineering(all)

Fingerprint Dive into the research topics of 'Learning hierarchically decomposable concepts with active over-labeling'. Together they form a unique fingerprint.

Cite this