Statistical issues with labeled sample size analysis for semi-supervised linear discriminant analysis

Han Liu*, Xiaolin Yang, Di Wu, Xiaobin Yuan, Ji Zhang, Rafal Kustra

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Recently, semi-supervised classification has drawn more attention and many practical semi-supervised learning methods have been proposed. However, current literature ignores an important fact-How to estimate the exact labeled sample size given a lot of unlabeled samples, which is important because of the rareness and expensiveness of labeled examples and is also crucial for us to explore the relative value of labeled and unlabeled samples given a specific model. Based on the assumption of a latent gaussian-distribution to the domain, we describe a reasonable labeled sample size estimation method for semi-supervised linear discriminant analysis (Transductive LDA). A detailed mathematical derivation and a computationally tractable approach are given out. Our technique extends naturally to handle two difficult problems: learning from gaussian distributions with different covariances, and learning for multiple classes.

Original languageEnglish (US)
Title of host publicationProceedings of the International Conference on Artificial Intelligence, IC-AI'04 and Proceedings of the International Conference on Machine Learning; Models, Technologies and Applications, MLMTA'04)
EditorsH.R. Arabnia, M. Youngsong
Pages1007-1012
Number of pages6
StatePublished - 2004
EventProceedings of the International Conference on Artificial Intelligence, IC-AI'04 - Las Vegas, NV, United States
Duration: Jun 21 2004Jun 24 2004

Publication series

NameProceedings of the International Conference on Artificial Intelligence, IC-AI'04
Volume2

Other

OtherProceedings of the International Conference on Artificial Intelligence, IC-AI'04
Country/TerritoryUnited States
CityLas Vegas, NV
Period6/21/046/24/04

Keywords

  • Bayes risk
  • Sample size estimation
  • Semi-supervised classification
  • Transductive LDA
  • Unlabeled data

ASJC Scopus subject areas

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Statistical issues with labeled sample size analysis for semi-supervised linear discriminant analysis'. Together they form a unique fingerprint.

Cite this