PinterNet: A thematic label curation tool for large image datasets

Ruoqian Liu, Diana Palsetia, Arindam Paul, Reda Al-Bahrani, Dipendra Jha, Wei Keng Liao, Ankit Agrawal, Alok Choudhary

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

Recent progress in big data and computer vision with deep learning models has gained a lot of attention. Deep learning has been performed on tasks such as image classification, object detection, image segmentation, image captioning, visual question and answering, using large collections of annotated images. This calls for more curated large image datasets with clearer descriptions, cleaner contents, and diversified usability. However, the curation and labeling of such datasets can be labor-intensive. In this paper, we present PinterNet, an algorithm for automatic curation and label generation from noisy textual descriptions, and also publish a big image dataset containing over 110K images automatically labeled with their themes. Our dataset is hierarchical in nature, it has high level category information which we refer as verticals with fine-grained thematic labels at lower level. This advocates a new type of hierarchical theme classification problem closer to human cognition and of business value. We provide benchmark performances using deep learning models based on AlexNet architecture with different pre-training schemes for this novel task and new data.

Original languageEnglish (US)
Title of host publicationProceedings - 2016 IEEE International Conference on Big Data, Big Data 2016
EditorsRonay Ak, George Karypis, Yinglong Xia, Xiaohua Tony Hu, Philip S. Yu, James Joshi, Lyle Ungar, Ling Liu, Aki-Hiro Sato, Toyotaro Suzumura, Sudarsan Rachuri, Rama Govindaraju, Weijia Xu
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2353-2362
Number of pages10
ISBN (Electronic)9781467390040
DOIs
StatePublished - 2016
Event4th IEEE International Conference on Big Data, Big Data 2016 - Washington, United States
Duration: Dec 5 2016Dec 8 2016

Publication series

NameProceedings - 2016 IEEE International Conference on Big Data, Big Data 2016

Other

Other4th IEEE International Conference on Big Data, Big Data 2016
Country/TerritoryUnited States
CityWashington
Period12/5/1612/8/16

Keywords

  • Computer vision
  • Dataset
  • Image classification
  • Label curation
  • Theme classification

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'PinterNet: A thematic label curation tool for large image datasets'. Together they form a unique fingerprint.

Cite this