An architecture for compiling UDF-centric workflows

Andrew Crotty, Alex Galakatos, Kayhan Dursun, Tim Kraska, Carsten Binnig, Ugur Cetintemel, Stan Zdonik

Research output: Chapter in Book/Report/Conference proceedingChapter

66 Scopus citations

Abstract

Data analytics has recently grown to include increasingly sophisticated techniques, such as machine learning and advanced statistics. Users frequently express these complex analytics tasks as workflows of user-defined functions (UDFs) that specify each algorithmic step. However, given typical hardware configurations and dataset sizes, the core challenge of complex analytics is no longer sheer data volume but rather the computation itself, and the next generation of analytics frameworks must focus on optimizing for this computation bottleneck. While query compilation has gained widespread popularity as a way to tackle the computation bottleneck for traditional SQL workloads, relatively little work addresses UDF-centric workflows in the domain of complex analytics. In this paper, we describe a novel architecture for automatically compiling workflows of UDFs. We also propose several optimizations that consider properties of the data, UDFs, and hardware together in order to generate different code on a case-by-case basis. To evaluate our approach, we implemented these techniques in TUPLEWARE, a new high-performance distributed analytics system, and our benchmarks show performance improvements of up to three orders of magnitude compared to alternative systems.

Original languageEnglish (US)
Title of host publicationProceedings of the VLDB Endowment
EditorsKi-Joune Li, Simonas Saltenis, Christophe Claramunt
PublisherAssociation for Computing Machinery
Pages1466-1477
Number of pages12
Volume8
Edition12 12
DOIs
StatePublished - 2015
Event3rd Workshop on Spatio-Temporal Database Management, STDBM 2006, Co-located with the 32nd International Conference on Very Large Data Bases, VLDB 2006 - Seoul, Korea, Republic of
Duration: Sep 11 2006Sep 11 2006

Other

Other3rd Workshop on Spatio-Temporal Database Management, STDBM 2006, Co-located with the 32nd International Conference on Very Large Data Bases, VLDB 2006
Country/TerritoryKorea, Republic of
CitySeoul
Period9/11/069/11/06

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • General Computer Science

Fingerprint

Dive into the research topics of 'An architecture for compiling UDF-centric workflows'. Together they form a unique fingerprint.

Cite this