Semi-supervised text classification: Partial training from unlabeled data

Yingtao Bi*, Daniel R. Jeske, Regina Y. Liu

*Corresponding author for this work

    Research output: Contribution to conferencePaper

    1 Scopus citations

    Abstract

    We illustrate by a case study how a semi-supervised approach can improve the performance of text classification. We begin with a naïve Bayes classifier trained exclusively from labeled text documents, and apply it to a set of unlabeled text documents to derive their pseudo-labels. The pseudo-labels are then combined with the true labels in the original training sample, and a naïve Bayes classifier is built based on the enlarged training sample. We consider different proportions of pseudo-labels in the enlarged training sample, and examine the effect of the semi-supervised approach on the misclassification rate by using cross validation comparisons.

    Original languageEnglish (US)
    StatePublished - Dec 1 2006
    Event2006 IIE Annual Conference and Exposition - Orlando, FL, United States
    Duration: May 20 2006May 24 2006

    Other

    Other2006 IIE Annual Conference and Exposition
    CountryUnited States
    CityOrlando, FL
    Period5/20/065/24/06

      Fingerprint

    Keywords

    • Cross validation
    • Naïve Bayes classification
    • Semi-supervised learning
    • Text classification
    • Text mining

    ASJC Scopus subject areas

    • Industrial and Manufacturing Engineering

    Cite this

    Bi, Y., Jeske, D. R., & Liu, R. Y. (2006). Semi-supervised text classification: Partial training from unlabeled data. Paper presented at 2006 IIE Annual Conference and Exposition, Orlando, FL, United States.