TY - GEN
T1 - Rake
T2 - 2011 IEEE 19th International Workshop on Quality of Service, IWQoS 2011
AU - Zhao, Yao
AU - Cao, Yinzhi
AU - Chen, Yan
AU - Zhang, Ming
AU - Goyal, Anup
N1 - Copyright:
Copyright 2011 Elsevier B.V., All rights reserved.
PY - 2011
Y1 - 2011
N2 - The ability to trace request execution paths is critical for diagnosing performance faults in large-scale distributed systems. Previous black-box and white-box approaches are either inaccurate or invasive. We present a novel semantics-assisted gray-box tracing approach, called Rake, which can accurately trace individual request by observing network traffic. Rake infers the causality between messages by identifying polymorphic IDs in messages according to application semantics. To make Rake universally applicable, we design a Rake language so that users can easily describe necessary semantics of their applications while reusing the core Rake component. We evaluate Rake using a few popular distributed applications, including web search, distributed computing cluster, content provider network, and online chatting. Our results demonstrate Rake is much more accurate than the black-box approaches while requiring no modification to OS/applications. In the CoralCDN (a content distributed network) experiments, Rake links messages with much higher accuracy than WAP5, a state-of-the-art black-box approach. In the Hadoop (a distributed computing cluster platform) experiments, Rake helps reveal several previously unknown issues that may lead to performance degradation, including a RPC (Remote Procedure Call) abusing problem.
AB - The ability to trace request execution paths is critical for diagnosing performance faults in large-scale distributed systems. Previous black-box and white-box approaches are either inaccurate or invasive. We present a novel semantics-assisted gray-box tracing approach, called Rake, which can accurately trace individual request by observing network traffic. Rake infers the causality between messages by identifying polymorphic IDs in messages according to application semantics. To make Rake universally applicable, we design a Rake language so that users can easily describe necessary semantics of their applications while reusing the core Rake component. We evaluate Rake using a few popular distributed applications, including web search, distributed computing cluster, content provider network, and online chatting. Our results demonstrate Rake is much more accurate than the black-box approaches while requiring no modification to OS/applications. In the CoralCDN (a content distributed network) experiments, Rake links messages with much higher accuracy than WAP5, a state-of-the-art black-box approach. In the Hadoop (a distributed computing cluster platform) experiments, Rake helps reveal several previously unknown issues that may lead to performance degradation, including a RPC (Remote Procedure Call) abusing problem.
UR - http://www.scopus.com/inward/record.url?scp=79960684792&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79960684792&partnerID=8YFLogxK
U2 - 10.1109/IWQOS.2011.5931314
DO - 10.1109/IWQOS.2011.5931314
M3 - Conference contribution
AN - SCOPUS:79960684792
SN - 9781457701030
T3 - IEEE International Workshop on Quality of Service, IWQoS
BT - 2011 IEEE 19th International Workshop on Quality of Service, IWQoS 2011
Y2 - 6 June 2011 through 7 June 2011
ER -