Semi-supervised text classification: Partial training from unlabeled data

Yingtao Bi*, Daniel R. Jeske, Regina Y. Liu

*Corresponding author for this work

Research output: Contribution to conferencePaperpeer-review

1 Scopus citations

Abstract

We illustrate by a case study how a semi-supervised approach can improve the performance of text classification. We begin with a naïve Bayes classifier trained exclusively from labeled text documents, and apply it to a set of unlabeled text documents to derive their pseudo-labels. The pseudo-labels are then combined with the true labels in the original training sample, and a naïve Bayes classifier is built based on the enlarged training sample. We consider different proportions of pseudo-labels in the enlarged training sample, and examine the effect of the semi-supervised approach on the misclassification rate by using cross validation comparisons.

Original languageEnglish (US)
StatePublished - Dec 1 2006
Event2006 IIE Annual Conference and Exposition - Orlando, FL, United States
Duration: May 20 2006May 24 2006

Other

Other2006 IIE Annual Conference and Exposition
Country/TerritoryUnited States
CityOrlando, FL
Period5/20/065/24/06

Keywords

  • Cross validation
  • Naïve Bayes classification
  • Semi-supervised learning
  • Text classification
  • Text mining

ASJC Scopus subject areas

  • Industrial and Manufacturing Engineering

Fingerprint

Dive into the research topics of 'Semi-supervised text classification: Partial training from unlabeled data'. Together they form a unique fingerprint.

Cite this