TY - GEN
T1 - Store-ordered streaming of shared memory
AU - Wenisch, Thomas F.
AU - Somogyi, Stephen
AU - Hardavellas, Nikolaos
AU - Kim, Jangwoo
AU - Gniady, Chris
AU - Ailamaki, Anastassia
AU - Falsafi, Babak
PY - 2005
Y1 - 2005
N2 - Coherence misses in shared-memory multiprocessors account for a substantial fraction of execution time in many important scientific and commercial workloads. Memory streaming provides a promising solution to the coherence miss bottleneck because it improves memory level parallelism and lookahead while using on-chip resources efficiently. We observe that the order in which shared data are consumed by one processor is correlated to the order in which they were produced by another. We investigate this phenomenon and demonstrate that it can be exploited to send Store-ORDered Streams (SORDS) of shared data from producers to consumers, thereby eliminating coherent read misses. Using a trace-driven analysis of all user and OS memory references in a cache-coherent distributed shared-memory multiprocessor, we show that SORDS-based memory streaming can eliminate between 36% and 100% of all coherent read misses in scientific workloads and between 23% and 48% in online transaction processing workloads.
AB - Coherence misses in shared-memory multiprocessors account for a substantial fraction of execution time in many important scientific and commercial workloads. Memory streaming provides a promising solution to the coherence miss bottleneck because it improves memory level parallelism and lookahead while using on-chip resources efficiently. We observe that the order in which shared data are consumed by one processor is correlated to the order in which they were produced by another. We investigate this phenomenon and demonstrate that it can be exploited to send Store-ORDered Streams (SORDS) of shared data from producers to consumers, thereby eliminating coherent read misses. Using a trace-driven analysis of all user and OS memory references in a cache-coherent distributed shared-memory multiprocessor, we show that SORDS-based memory streaming can eliminate between 36% and 100% of all coherent read misses in scientific workloads and between 23% and 48% in online transaction processing workloads.
UR - http://www.scopus.com/inward/record.url?scp=33746741675&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33746741675&partnerID=8YFLogxK
U2 - 10.1109/PACT.2005.37
DO - 10.1109/PACT.2005.37
M3 - Conference contribution
AN - SCOPUS:33746741675
SN - 076952429X
SN - 9780769524290
T3 - Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT
SP - 75
EP - 84
BT - 14th International Conference on Parallel Architectures and Compilation Techniques, PACT 2005
T2 - 14th International Conference on Parallel Architectures and Compilation Techniques, PACT 2005
Y2 - 17 September 2005 through 21 September 2005
ER -