Improving cache locality by a combination of loop and data transformations

Mahmut Kandemir*, J. Ramanujam, Alok Choudhary

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

90 Scopus citations


Exploiting locality of reference is key to realizing high levels of performance on modern processors. This paper describes a compiler algorithm for optimizing cache locality in scientific codes on uniprocessor and multiprocessor machines. A distinctive characteristic of our algorithm is that it considers loop and data layout transformations in a unified framework. Our approach is very effective at reducing cache misses and can optimize some nests for which optimization techniques based on loop transformations alone are not successful. An important special case is one in which data layouts of some arrays are fixed and cannot be changed. We show how our algorithm can accommodate this case and demonstrate how it can be used to optimize multiple loop nests. Experiments on several benchmarks show that the techniques presented in this paper result in substantial improvement in cache performance.

Original languageEnglish (US)
Pages (from-to)159-167
Number of pages9
JournalIEEE Transactions on Computers
Issue number2
StatePublished - 1999


  • Caches
  • Data reuse
  • Locality
  • Loop and data transformations
  • Optimizing compilers

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computational Theory and Mathematics


Dive into the research topics of 'Improving cache locality by a combination of loop and data transformations'. Together they form a unique fingerprint.

Cite this