TY - GEN
T1 - Temporal streaming of shared memory
AU - Wenisch, Thomas F.
AU - Somogyi, Stephen
AU - Hardavellas, Nikolaos
AU - Kim, Jangwoo
AU - Ailamaki, Anastassia
AU - Falsafi, Babak
PY - 2005
Y1 - 2005
N2 - Coherent read misses in shared-memory multiprocessors account for a substantial fraction of execution time in many important scientific and commercial workloads. We propose Temporal Streaming, to eliminate coherent read misses by streaming data to a processor in advance of the corresponding memory accesses. Temporal streaming dynamically identifies address sequences to be streamed by exploiting two common phenomena in shared-memory access patterns: (1) temporal address correlation - groups of shared addresses tend to be accessed together and in the same order, and (2) temporal stream locality - recently-accessed address streams are likely to recur. We present a practical design for temporal streaming. We evaluate our design using a combination of trace-driven and cycle-accurate full-system simulation of a cache-coherent distributed shared-memory system. We show that temporal streaming can eliminate 98% of coherent read misses in scientific applications, and between 43% and 60% in database and web server workloads. Our design yields speedups of 1.07 to 3.29 in scientific applications, and 1.06 to 1.21 in commercial workloads.
AB - Coherent read misses in shared-memory multiprocessors account for a substantial fraction of execution time in many important scientific and commercial workloads. We propose Temporal Streaming, to eliminate coherent read misses by streaming data to a processor in advance of the corresponding memory accesses. Temporal streaming dynamically identifies address sequences to be streamed by exploiting two common phenomena in shared-memory access patterns: (1) temporal address correlation - groups of shared addresses tend to be accessed together and in the same order, and (2) temporal stream locality - recently-accessed address streams are likely to recur. We present a practical design for temporal streaming. We evaluate our design using a combination of trace-driven and cycle-accurate full-system simulation of a cache-coherent distributed shared-memory system. We show that temporal streaming can eliminate 98% of coherent read misses in scientific applications, and between 43% and 60% in database and web server workloads. Our design yields speedups of 1.07 to 3.29 in scientific applications, and 1.06 to 1.21 in commercial workloads.
UR - http://www.scopus.com/inward/record.url?scp=27544508955&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=27544508955&partnerID=8YFLogxK
U2 - 10.1109/ISCA.2005.50
DO - 10.1109/ISCA.2005.50
M3 - Conference contribution
AN - SCOPUS:27544508955
SN - 076952270X
T3 - Proceedings - International Symposium on Computer Architecture
SP - 222
EP - 233
BT - Proceedings - 32nd International Symposium on Computer Architecture, ISCA 2005
T2 - 32nd Interntional Symposium on Computer Architecture, ISCA 2005
Y2 - 4 June 2005 through 8 June 2005
ER -