Pipeline depths in high performance dynamically scheduled microprocessors are increasing steadily. In addition, level 1 caches are shrinking to meet latency constraints - but more levels of cache are being added to mitigate this performance impact. Moreover, the growing schedule-to-execute-window of deeply pipelined processors has required the use of speculative scheduling techniques. When these effects are combined, we are faced with performance degradation and increased power consumption due to load misscheduling, particularly when considering instructions dependent on in-flight loads. In this paper, we propose a scheduler for such processors. Instead of non-selectively speculating, the scheduler predicts the execution delay of an instruction and issues them accordingly. This, in return, can eliminate the issuing of some operations that will otherwise be squashed. Clearly, load operations constitute an important obstacle in predicting the latency of instructions, because their latencies are not known until the cache access stage, which happens later in the pipeline. Our proposed techniques can estimate the arrival of cache blocks in various locations of the cache hierarchy, thereby enabling more precise scheduling of instructions dependent on these loads. Our scheduler makes use of two structures: A Previously-Accessed Table that stores the source addresses of in-flight load operations and a Cache Miss Detection Engine that detects the location of the block to be accessed in the memory hierarchy. Using the SPEC 2000 CPU suite, we show that the number of instructions issued can be reduced by as much as 52.5% (16.9% on average) while increasing the performance by as much as 42.1% (14.3% on average) over the performance of an aggressive processor.
|Journal of Instruction-Level Parallelism
|Published - Apr 2005
ASJC Scopus subject areas
- Information Systems
- Hardware and Architecture