Semi-supervised learning of attribute-value pairs from product descriptions

Katharina Probst, Rayid Ghani, Marko Krema, Andrew Fano, Yan Liu

Research output: Contribution to journalConference articlepeer-review

58 Scopus citations


We describe an approach to extract attribute-value pairs from product descriptions. This allows us to represent products as sets of such attribute-value pairs to augment product databases. Such a representation is useful for a variety of tasks where treating a product as a set of attribute-value pairs is more useful than as an atomic entity. Examples of such applications include product recommendations, product comparison, and demand forecasting. We formulate the extraction as a classification problem and use a semi-supervised algorithm (co-EM) along with (Naïve Bayes). The extraction system requires very little initial user supervision: using unlabeled data, we automatically extract an initial seed list that serves as training data for the supervised and semi-supervised classification algorithms. Finally, the extracted attributes and values are linked to form pairs using dependency information and co-location scores. We present promising results on product descriptions in two categories of sporting goods.

Original languageEnglish (US)
Pages (from-to)2838-2843
Number of pages6
JournalIJCAI International Joint Conference on Artificial Intelligence
StatePublished - 2007
Externally publishedYes
Event20th International Joint Conference on Artificial Intelligence, IJCAI 2007 - Hyderabad, India
Duration: Jan 6 2007Jan 12 2007

ASJC Scopus subject areas

  • Artificial Intelligence


Dive into the research topics of 'Semi-supervised learning of attribute-value pairs from product descriptions'. Together they form a unique fingerprint.

Cite this