Abstract
There is a fundamental discrepancy between the targeted and actual users of current analytics frameworks. Most systems are designed for the challenges of the Googles and Facebooks of the world—processing petabytes of data distributed across large cloud deployments consisting of thousands of cheap commodity machines. Yet, the vast majority of users analyze relatively small datasets of up to several terabytes in size, perform primarily compute-intensive operations, and operate clusters ranging from only a few to a few dozen nodes. Targeting these users fundamentally changes the way we should build analytics systems. This paper describes our vision for the design of TUPLEWARE, a new system specifically aimed at complex analytics on small clusters. TUPLEWARE’s architecture brings together ideas from the database and compiler communities to create a powerful end-to-end solution for data analysis that compiles workflows of user-defined functions into distributed programs. Our preliminary results show performance improvements of up to three orders of magnitude over alternative systems.
Original language | English (US) |
---|---|
State | Published - 2015 |
Event | 7th Biennial Conference on Innovative Data Systems Research, CIDR 2015 - Asilomar, United States Duration: Jan 4 2015 → Jan 7 2015 |
Conference
Conference | 7th Biennial Conference on Innovative Data Systems Research, CIDR 2015 |
---|---|
Country/Territory | United States |
City | Asilomar |
Period | 1/4/15 → 1/7/15 |
ASJC Scopus subject areas
- Information Systems and Management
- Hardware and Architecture
- Artificial Intelligence
- Information Systems