TY - JOUR
T1 - Cashmere-2L
T2 - Software Coherent Shared Memory on a Clustered Remote-Write Network
AU - Stets, Robert
AU - Dwarkadas, Sandhya
AU - Hardavellas, Nikolaos
AU - Hunt, Galen
AU - Kontothanassis, Leonidas
AU - Parthasarathy, Srinivasan
AU - Scott, Michael
PY - 1997/12
Y1 - 1997/12
N2 - Low-latency remote-write networks, such as DEC's Memory Channel, provide the possibility of transparent, inexpensive, large-scale shared-memory parallel computing on clusters of shared memory multiprocessors (SMPs). The challenge is to take advantage of hardware shared memory for sharing within an SMP, and to ensure that software overhead is incurred only when actively sharing data across SMPs in the cluster. In this paper, we describe a "two-level" software coherent shared memory system - Cashmere-2L -that meets this challenge. Cashmere-2L uses hardware to share memory within a node, while exploiting the Memory Channel's remote-write capabilities to implement "moderately lazy" release consistency with multiple concurrent writers, directories, home nodes, and page-size coherence blocks across nodes. Cashmere-2L employs a novel coherence protocol that allows a high level of asynchrony by eliminating global directory locks and the need for TLB shootdown. Remote interrupts are minimized by exploiting the remote-write capabilities of the Memory Channel network. Cashmere-2L currently runs on an 8-node, 32-processor DEC AlphaServer system. Speedups range from 8 to 31 on 32 processors for our benchmark suite, depending on the application's characteristics. We quantify the importance of our protocol optimizations by comparing performance to that of several alternative protocols that do not share memory in hardware within an SMP, and require more synchronization. In comparison to a one-level protocol that does not share memory in hardware within an SMP, Cashmere-2L improves performance by up to 46%.
AB - Low-latency remote-write networks, such as DEC's Memory Channel, provide the possibility of transparent, inexpensive, large-scale shared-memory parallel computing on clusters of shared memory multiprocessors (SMPs). The challenge is to take advantage of hardware shared memory for sharing within an SMP, and to ensure that software overhead is incurred only when actively sharing data across SMPs in the cluster. In this paper, we describe a "two-level" software coherent shared memory system - Cashmere-2L -that meets this challenge. Cashmere-2L uses hardware to share memory within a node, while exploiting the Memory Channel's remote-write capabilities to implement "moderately lazy" release consistency with multiple concurrent writers, directories, home nodes, and page-size coherence blocks across nodes. Cashmere-2L employs a novel coherence protocol that allows a high level of asynchrony by eliminating global directory locks and the need for TLB shootdown. Remote interrupts are minimized by exploiting the remote-write capabilities of the Memory Channel network. Cashmere-2L currently runs on an 8-node, 32-processor DEC AlphaServer system. Speedups range from 8 to 31 on 32 processors for our benchmark suite, depending on the application's characteristics. We quantify the importance of our protocol optimizations by comparing performance to that of several alternative protocols that do not share memory in hardware within an SMP, and require more synchronization. In comparison to a one-level protocol that does not share memory in hardware within an SMP, Cashmere-2L improves performance by up to 46%.
UR - http://www.scopus.com/inward/record.url?scp=0031542279&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0031542279&partnerID=8YFLogxK
U2 - 10.1145/269005.266675
DO - 10.1145/269005.266675
M3 - Article
AN - SCOPUS:0031542279
SN - 0163-5980
VL - 31
SP - 170
EP - 183
JO - Operating Systems Review (ACM)
JF - Operating Systems Review (ACM)
IS - 5
ER -