SILVERBACK: Scalable association mining for temporal data in columnar probabilistic databases

Yusheng Xie, Diana Palsetia, Goce Trajcevski, Ankit Agrawal, Alok Choudhary

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Scopus citations

Abstract

We1 address the problem of large scale probabilistic association rule mining and consider the trade-offs between accuracy of the mining results and quest of scalability on modest hardware infrastructure. We demonstrate how extensions and adaptations of research findings can be integrated in an industrial application, and we present the commercially deployed Silverback framework, developed at Voxsup Inc. Silverback tackles the storage efficiency problem by proposing a probabilistic columnar infrastructure and using Bloom filters and reservoir sampling techniques. In addition, a probabilistic pruning technique has been introduced based on Apriori for mining frequent item-sets. The proposed target-driven technique yields a significant reduction on the size of the frequent item-set candidates. We present extensive experimental evaluations which demonstrate the benefits of a context-aware incorporation of infrastructure limitations into corresponding research techniques. The experiments indicate that, when compared to the traditional Hadoop-based approach for improving scalability by adding more hosts, Silverback - which has been commercially deployed and developed at Voxsup Inc. since May 2011 - has much better run-time performance with negligible accuracy sacrifices.

Original languageEnglish (US)
Title of host publication2014 IEEE 30th International Conference on Data Engineering, ICDE 2014
PublisherIEEE Computer Society
Pages1072-1083
Number of pages12
ISBN (Print)9781479925544
DOIs
StatePublished - 2014
Event30th IEEE International Conference on Data Engineering, ICDE 2014 - Chicago, IL, United States
Duration: Mar 31 2014Apr 4 2014

Publication series

NameProceedings - International Conference on Data Engineering
ISSN (Print)1084-4627

Other

Other30th IEEE International Conference on Data Engineering, ICDE 2014
CountryUnited States
CityChicago, IL
Period3/31/144/4/14

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Information Systems

Fingerprint Dive into the research topics of 'SILVERBACK: Scalable association mining for temporal data in columnar probabilistic databases'. Together they form a unique fingerprint.

Cite this