Semi-supervised text classification: Partial training from unlabeled data

Yingtao Bi*, Daniel R. Jeske, Regina Y. Liu

*Corresponding author for this work

    Research output: Contribution to conferencePaperpeer-review

    1 Scopus citations


    We illustrate by a case study how a semi-supervised approach can improve the performance of text classification. We begin with a naïve Bayes classifier trained exclusively from labeled text documents, and apply it to a set of unlabeled text documents to derive their pseudo-labels. The pseudo-labels are then combined with the true labels in the original training sample, and a naïve Bayes classifier is built based on the enlarged training sample. We consider different proportions of pseudo-labels in the enlarged training sample, and examine the effect of the semi-supervised approach on the misclassification rate by using cross validation comparisons.

    Original languageEnglish (US)
    StatePublished - Dec 1 2006
    Event2006 IIE Annual Conference and Exposition - Orlando, FL, United States
    Duration: May 20 2006May 24 2006


    Other2006 IIE Annual Conference and Exposition
    Country/TerritoryUnited States
    CityOrlando, FL


    • Cross validation
    • Naïve Bayes classification
    • Semi-supervised learning
    • Text classification
    • Text mining

    ASJC Scopus subject areas

    • Industrial and Manufacturing Engineering


    Dive into the research topics of 'Semi-supervised text classification: Partial training from unlabeled data'. Together they form a unique fingerprint.

    Cite this