Mining and Organizing User-Generated Content to Identify Attributes and Attribute Levels

Artem Timoshenko, John R. Hauser

Research output: Chapter in Book/Report/Conference proceedingConference contribution


We investigate User-Generated Content (UGC) as a source of customer needs from which to identify attributes and attribute levels for a high-craft conjoint analysis study. Non-informative and repetitive content crowd out information about customer needs in a large corpus of UGC. We design a machine-learning hybrid approach to enhance customer-need extraction making it more effective and efficient. We use a convolutional neural network (CNN) to identify informative content. Using pre-trained word embeddings, we create numerical sentence representations to capture the semantic meaning of UGC sentences. We cluster sentence representations and sample sentences from different clusters to enhance the diversity of the content selected for manual review. The final extraction of customer needs from informative diverse sentences relies on human effort. In a proof-of-concept application to oral care, we compare customer needs identified from UGC to customer needs identified from experiential interviews. First, our analyses suggest that, for comparable human effort, UGC allows identifying a comparable set of customer needs. Second, machine learning enables analysts to identify the same number of customer needs with less effort.
Original languageEnglish (US)
Title of host publicationProceedings of the Sawtooth Software Conference
Number of pages20
StatePublished - 2016


Dive into the research topics of 'Mining and Organizing User-Generated Content to Identify Attributes and Attribute Levels'. Together they form a unique fingerprint.

Cite this