Store-ordered streaming of shared memory

Thomas F. Wenisch*, Stephen Somogyi, Nikolaos Hardavellas, Jangwoo Kim, Chris Gniady, Anastassia Ailamaki, Babak Falsafi

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Citations (Scopus)

Abstract

Coherence misses in shared-memory multiprocessors account for a substantial fraction of execution time in many important scientific and commercial workloads. Memory streaming provides a promising solution to the coherence miss bottleneck because it improves memory level parallelism and lookahead while using on-chip resources efficiently. We observe that the order in which shared data are consumed by one processor is correlated to the order in which they were produced by another. We investigate this phenomenon and demonstrate that it can be exploited to send Store-ORDered Streams (SORDS) of shared data from producers to consumers, thereby eliminating coherent read misses. Using a trace-driven analysis of all user and OS memory references in a cache-coherent distributed shared-memory multiprocessor, we show that SORDS-based memory streaming can eliminate between 36% and 100% of all coherent read misses in scientific workloads and between 23% and 48% in online transaction processing workloads.

Original languageEnglish (US)
Title of host publication14th International Conference on Parallel Architectures and Compilation Techniques, PACT 2005
Pages75-84
Number of pages10
DOIs
StatePublished - Dec 1 2005
Event14th International Conference on Parallel Architectures and Compilation Techniques, PACT 2005 - St. Louis, MO, United States
Duration: Sep 17 2005Sep 21 2005

Publication series

NameParallel Architectures and Compilation Techniques - Conference Proceedings, PACT
Volume2005
ISSN (Print)1089-795X

Other

Other14th International Conference on Parallel Architectures and Compilation Techniques, PACT 2005
CountryUnited States
CitySt. Louis, MO
Period9/17/059/21/05

Fingerprint

Shared Memory
Streaming
Workload
Data storage equipment
Shared-memory multiprocessors
Transaction Processing
Distributed Shared Memory
Look-ahead
Cache
Execution Time
Trace analysis
Parallelism
Chip
Eliminate
Trace
Resources
Demonstrate
Processing

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture

Cite this

Wenisch, T. F., Somogyi, S., Hardavellas, N., Kim, J., Gniady, C., Ailamaki, A., & Falsafi, B. (2005). Store-ordered streaming of shared memory. In 14th International Conference on Parallel Architectures and Compilation Techniques, PACT 2005 (pp. 75-84). [1515582] (Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT; Vol. 2005). https://doi.org/10.1109/PACT.2005.37
Wenisch, Thomas F. ; Somogyi, Stephen ; Hardavellas, Nikolaos ; Kim, Jangwoo ; Gniady, Chris ; Ailamaki, Anastassia ; Falsafi, Babak. / Store-ordered streaming of shared memory. 14th International Conference on Parallel Architectures and Compilation Techniques, PACT 2005. 2005. pp. 75-84 (Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT).
@inproceedings{0b3fcb769c2b4a28b2134d4466f03cc8,
title = "Store-ordered streaming of shared memory",
abstract = "Coherence misses in shared-memory multiprocessors account for a substantial fraction of execution time in many important scientific and commercial workloads. Memory streaming provides a promising solution to the coherence miss bottleneck because it improves memory level parallelism and lookahead while using on-chip resources efficiently. We observe that the order in which shared data are consumed by one processor is correlated to the order in which they were produced by another. We investigate this phenomenon and demonstrate that it can be exploited to send Store-ORDered Streams (SORDS) of shared data from producers to consumers, thereby eliminating coherent read misses. Using a trace-driven analysis of all user and OS memory references in a cache-coherent distributed shared-memory multiprocessor, we show that SORDS-based memory streaming can eliminate between 36{\%} and 100{\%} of all coherent read misses in scientific workloads and between 23{\%} and 48{\%} in online transaction processing workloads.",
author = "Wenisch, {Thomas F.} and Stephen Somogyi and Nikolaos Hardavellas and Jangwoo Kim and Chris Gniady and Anastassia Ailamaki and Babak Falsafi",
year = "2005",
month = "12",
day = "1",
doi = "10.1109/PACT.2005.37",
language = "English (US)",
isbn = "076952429X",
series = "Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT",
pages = "75--84",
booktitle = "14th International Conference on Parallel Architectures and Compilation Techniques, PACT 2005",

}

Wenisch, TF, Somogyi, S, Hardavellas, N, Kim, J, Gniady, C, Ailamaki, A & Falsafi, B 2005, Store-ordered streaming of shared memory. in 14th International Conference on Parallel Architectures and Compilation Techniques, PACT 2005., 1515582, Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT, vol. 2005, pp. 75-84, 14th International Conference on Parallel Architectures and Compilation Techniques, PACT 2005, St. Louis, MO, United States, 9/17/05. https://doi.org/10.1109/PACT.2005.37

Store-ordered streaming of shared memory. / Wenisch, Thomas F.; Somogyi, Stephen; Hardavellas, Nikolaos; Kim, Jangwoo; Gniady, Chris; Ailamaki, Anastassia; Falsafi, Babak.

14th International Conference on Parallel Architectures and Compilation Techniques, PACT 2005. 2005. p. 75-84 1515582 (Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT; Vol. 2005).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Store-ordered streaming of shared memory

AU - Wenisch, Thomas F.

AU - Somogyi, Stephen

AU - Hardavellas, Nikolaos

AU - Kim, Jangwoo

AU - Gniady, Chris

AU - Ailamaki, Anastassia

AU - Falsafi, Babak

PY - 2005/12/1

Y1 - 2005/12/1

N2 - Coherence misses in shared-memory multiprocessors account for a substantial fraction of execution time in many important scientific and commercial workloads. Memory streaming provides a promising solution to the coherence miss bottleneck because it improves memory level parallelism and lookahead while using on-chip resources efficiently. We observe that the order in which shared data are consumed by one processor is correlated to the order in which they were produced by another. We investigate this phenomenon and demonstrate that it can be exploited to send Store-ORDered Streams (SORDS) of shared data from producers to consumers, thereby eliminating coherent read misses. Using a trace-driven analysis of all user and OS memory references in a cache-coherent distributed shared-memory multiprocessor, we show that SORDS-based memory streaming can eliminate between 36% and 100% of all coherent read misses in scientific workloads and between 23% and 48% in online transaction processing workloads.

AB - Coherence misses in shared-memory multiprocessors account for a substantial fraction of execution time in many important scientific and commercial workloads. Memory streaming provides a promising solution to the coherence miss bottleneck because it improves memory level parallelism and lookahead while using on-chip resources efficiently. We observe that the order in which shared data are consumed by one processor is correlated to the order in which they were produced by another. We investigate this phenomenon and demonstrate that it can be exploited to send Store-ORDered Streams (SORDS) of shared data from producers to consumers, thereby eliminating coherent read misses. Using a trace-driven analysis of all user and OS memory references in a cache-coherent distributed shared-memory multiprocessor, we show that SORDS-based memory streaming can eliminate between 36% and 100% of all coherent read misses in scientific workloads and between 23% and 48% in online transaction processing workloads.

UR - http://www.scopus.com/inward/record.url?scp=33746741675&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33746741675&partnerID=8YFLogxK

U2 - 10.1109/PACT.2005.37

DO - 10.1109/PACT.2005.37

M3 - Conference contribution

AN - SCOPUS:33746741675

SN - 076952429X

SN - 9780769524290

T3 - Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT

SP - 75

EP - 84

BT - 14th International Conference on Parallel Architectures and Compilation Techniques, PACT 2005

ER -

Wenisch TF, Somogyi S, Hardavellas N, Kim J, Gniady C, Ailamaki A et al. Store-ordered streaming of shared memory. In 14th International Conference on Parallel Architectures and Compilation Techniques, PACT 2005. 2005. p. 75-84. 1515582. (Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT). https://doi.org/10.1109/PACT.2005.37