Network-aware scheduling for data-parallel jobs: Plan when you can

Virajith Jalaparti, Peter Bodik, Ishai Menache, Sriram Rao, Konstantin Makarychev, Matthew Caesar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

105 Scopus citations

Abstract

To reduce the impact of network congestion on big data jobs, cluster management frameworks use various heuristics to schedule compute tasks and/or network flows. Most of these schedulers consider the job input data fixed and greedily schedule the tasks and flows that are ready to run. However, a large fraction of production jobs are recurring with predictable characteristics, which allows us to plan ahead for them. Coordinating the placement of data and tasks of these jobs allows for significantly improving their network locality and freeing up bandwidth, which can be used by other jobs running on the cluster. With this intuition, we develop Corral, a scheduling framework that uses characteristics of future workloads to determine an offline schedule which (i) jointly places data and compute to achieve better data locality, and (ii) isolates jobs both spatially (by scheduling them in different parts of the cluster) and temporally, improving their performance. We implement Corral on Apache Yarn, and evaluate it on a 210 machine cluster using production workloads. Compared to Yarn's capacity scheduler, Corral reduces the makespan of these workloads up to 33% and the median completion time up to 56%, with 20-90% reduction in data transferred across racks.

Original languageEnglish (US)
Title of host publicationSIGCOMM 2015 - Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication
PublisherAssociation for Computing Machinery, Inc
Pages407-420
Number of pages14
ISBN (Electronic)9781450335423
DOIs
StatePublished - Aug 17 2015
EventACM Conference on Special Interest Group on Data Communication, SIGCOMM 2015 - London, United Kingdom
Duration: Aug 17 2015Aug 21 2015

Publication series

NameSIGCOMM 2015 - Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication

Other

OtherACM Conference on Special Interest Group on Data Communication, SIGCOMM 2015
CountryUnited Kingdom
CityLondon
Period8/17/158/21/15

Keywords

  • Cluster schedulers
  • Cross-layer optimization
  • Data-intensive applications
  • Joint data and compute placement

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Signal Processing
  • Electrical and Electronic Engineering
  • Communication

Fingerprint Dive into the research topics of 'Network-aware scheduling for data-parallel jobs: Plan when you can'. Together they form a unique fingerprint.

Cite this