Tupleware: “Big” data, big analytics, small clusters

Andrew Crotty, Alex Galakatos, Kayhan Dursun, Tim Kraska, Ugur Cetintemel, Stan Zdonik

Research output: Contribution to conferencePaperpeer-review

43 Scopus citations

Abstract

There is a fundamental discrepancy between the targeted and actual users of current analytics frameworks. Most systems are designed for the challenges of the Googles and Facebooks of the world—processing petabytes of data distributed across large cloud deployments consisting of thousands of cheap commodity machines. Yet, the vast majority of users analyze relatively small datasets of up to several terabytes in size, perform primarily compute-intensive operations, and operate clusters ranging from only a few to a few dozen nodes. Targeting these users fundamentally changes the way we should build analytics systems. This paper describes our vision for the design of TUPLEWARE, a new system specifically aimed at complex analytics on small clusters. TUPLEWARE’s architecture brings together ideas from the database and compiler communities to create a powerful end-to-end solution for data analysis that compiles workflows of user-defined functions into distributed programs. Our preliminary results show performance improvements of up to three orders of magnitude over alternative systems.

Original languageEnglish (US)
StatePublished - 2015
Event7th Biennial Conference on Innovative Data Systems Research, CIDR 2015 - Asilomar, United States
Duration: Jan 4 2015Jan 7 2015

Conference

Conference7th Biennial Conference on Innovative Data Systems Research, CIDR 2015
Country/TerritoryUnited States
CityAsilomar
Period1/4/151/7/15

ASJC Scopus subject areas

  • Information Systems and Management
  • Hardware and Architecture
  • Artificial Intelligence
  • Information Systems

Fingerprint

Dive into the research topics of 'Tupleware: “Big” data, big analytics, small clusters'. Together they form a unique fingerprint.

Cite this