Abstract
We describe an approach to extract attribute-value pairs from product descriptions. This allows us to represent products as sets of such attribute-value pairs to augment product databases. Such a representation is useful for a variety of tasks where treating a product as a set of attribute-value pairs is more useful than as an atomic entity. Examples of such applications include product recommendations, product comparison, and demand forecasting. We formulate the extraction as a classification problem and use a semi-supervised algorithm (co-EM) along with (Naïve Bayes). The extraction system requires very little initial user supervision: using unlabeled data, we automatically extract an initial seed list that serves as training data for the supervised and semi-supervised classification algorithms. Finally, the extracted attributes and values are linked to form pairs using dependency information and co-location scores. We present promising results on product descriptions in two categories of sporting goods.
Original language | English (US) |
---|---|
Pages (from-to) | 2838-2843 |
Number of pages | 6 |
Journal | IJCAI International Joint Conference on Artificial Intelligence |
State | Published - 2007 |
Externally published | Yes |
Event | 20th International Joint Conference on Artificial Intelligence, IJCAI 2007 - Hyderabad, India Duration: Jan 6 2007 → Jan 12 2007 |
ASJC Scopus subject areas
- Artificial Intelligence