Abstract
Exploiting locality of references has become extremely important in realizing the potential performance of modern machines with deep memory hierarchies. The data access patterns of programs and the memory layouts of the accessed data sets play a critical role in determining the performance of applications running on these machines. This paper presents a cache locality optimization technique that can optimize a loop nest even if the arrays referenced have different layouts in memory. Such a capability is required for a global locality optimization framework that applies both loop and data transformations to a sequence of loop nests for optimizing locality. Our method uses a single linear algebra framework to represent both data layouts and loop transformations. It computes a nonsingular loop transformation matrix such that, in a given loop nest, data locality is exploited in the innermost loops, where it is most useful. The inverse of a nonsingular transformation matrix is built column-by-column, starting from the rightmost column. In addition, our approach can work in those cases where the data layouts of a subset of the referenced arrays is unknown; this is a key step in optimizing a sequence of loop nests and whole programs for locality. Experimental results on an SGI/Cray Origin 2000 nonuniform memory access multiprocessor machine show that our technique reduces execution times by as much as 70 percent.
Original language | English (US) |
---|---|
Pages (from-to) | 1321-1336 |
Number of pages | 16 |
Journal | IEEE Transactions on Computers |
Volume | 50 |
Issue number | 12 |
DOIs | |
State | Published - Dec 2001 |
Funding
Mahmut Kandemir and Alok Choudhary were supported in part by US National Science Foundation (NSF) Young Investigator Award CCR-9357840 and NSF grant CCR-9509143. The work of J. Ramanujam is supported in part by NSF grant CCR-0073800 and by NSF Young Investigator Award CCR-9457768. Prithviraj Banerjee is supported in part by the NSF under grant CCR-9526325 and in part by the US Defense Advanced Research Projects Agency under contract F30602-98-2-0144. A preliminary version of this paper was presented at the Workshop on Languages and Compilers for Parallel Computing, Chapel Hill, North Carolina, August 1998.
Keywords
- Cache locality
- Data reuse
- Loop transformations
- Memory layouts
- Program optimization
ASJC Scopus subject areas
- Software
- Theoretical Computer Science
- Hardware and Architecture
- Computational Theory and Mathematics