Abstract
We illustrate by a case study how a semi-supervised approach can improve the performance of text classification. We begin with a naïve Bayes classifier trained exclusively from labeled text documents, and apply it to a set of unlabeled text documents to derive their pseudo-labels. The pseudo-labels are then combined with the true labels in the original training sample, and a naïve Bayes classifier is built based on the enlarged training sample. We consider different proportions of pseudo-labels in the enlarged training sample, and examine the effect of the semi-supervised approach on the misclassification rate by using cross validation comparisons.
Original language | English (US) |
---|---|
State | Published - Dec 1 2006 |
Event | 2006 IIE Annual Conference and Exposition - Orlando, FL, United States Duration: May 20 2006 → May 24 2006 |
Other
Other | 2006 IIE Annual Conference and Exposition |
---|---|
Country/Territory | United States |
City | Orlando, FL |
Period | 5/20/06 → 5/24/06 |
Keywords
- Cross validation
- Naïve Bayes classification
- Semi-supervised learning
- Text classification
- Text mining
ASJC Scopus subject areas
- Industrial and Manufacturing Engineering