TY - GEN
T1 - A case for tracking and exploiting inter-node and intra-node memory content sharing in virtualized large-scale parallel systems
AU - Xia, Lei
AU - Dinda, Peter A
PY - 2012
Y1 - 2012
N2 - In virtualized large-scale parallel systems scientific workloads consist of numerous processes running across many virtual nodes. Their memory footprint is massive, and this has consequences for services that enhance performance, reliability, or power. We argue that a service that dynamically tracks the sharing of memory content, both within individual nodes, and across nodes, can simplify and enhance the implementation of such services. For example, leveraging content sharing could significantly reduce the size of a checkpoint of a group of nodes. As another example, it could speed VM migration by allowing the reconstruction of a VM's memory from multiple source VMs. Finally, a service that improves reliability by introducing memory redundancy could leverage existing content sharing to minimize the memory costs of any particular level of redundancy. We argue that both intra- and inter-node memory content sharing is common in parallel applications, supporting this claim by a detailed study of both kinds of sharing, at different scales, different granularities, and different times for a range of applications and application benchmarks. We then describe the high level approach we are taking to design and implement a distributed, VMM-based system that can efficiently and scalably identify and track such sharing with low overhead.
AB - In virtualized large-scale parallel systems scientific workloads consist of numerous processes running across many virtual nodes. Their memory footprint is massive, and this has consequences for services that enhance performance, reliability, or power. We argue that a service that dynamically tracks the sharing of memory content, both within individual nodes, and across nodes, can simplify and enhance the implementation of such services. For example, leveraging content sharing could significantly reduce the size of a checkpoint of a group of nodes. As another example, it could speed VM migration by allowing the reconstruction of a VM's memory from multiple source VMs. Finally, a service that improves reliability by introducing memory redundancy could leverage existing content sharing to minimize the memory costs of any particular level of redundancy. We argue that both intra- and inter-node memory content sharing is common in parallel applications, supporting this claim by a detailed study of both kinds of sharing, at different scales, different granularities, and different times for a range of applications and application benchmarks. We then describe the high level approach we are taking to design and implement a distributed, VMM-based system that can efficiently and scalably identify and track such sharing with low overhead.
KW - Content Sharing
KW - Virtualization
UR - http://www.scopus.com/inward/record.url?scp=84863952062&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84863952062&partnerID=8YFLogxK
U2 - 10.1145/2287056.2287061
DO - 10.1145/2287056.2287061
M3 - Conference contribution
AN - SCOPUS:84863952062
SN - 9781450313445
T3 - VTDC '12 - 6th International Workshop on Virtualization Technologies in Distributed Computing
SP - 11
EP - 18
BT - VTDC '12 - 6th International Workshop on Virtualization Technologies in Distributed Computing
T2 - 6th International Workshop on Virtualization Technologies in Distributed Computing, VTDC '12
Y2 - 18 June 2012 through 18 June 2012
ER -