You are on page 1of 19

Data Prefetching

NAVEED AHMED
MUHAMMAD HASEEB -UL-HASSAN ZAHID
Why …
Expanding gap between microprocessor and memory (DRAM) performance
Common for scientific programs to spend more than half their run times stalled
on memory requests
Has the potential to significantly improve overall program execution time by
overlapping computation with memory accesses
Processor and Memory performance
since 1980
Previous Techniques
Implemented Cache Memory Hierarchies to reduce latency
Has managed to keep pace with processor memory request rates
But high speed caches are too expensive for main storage technology
Caches has proven to be effective in reducing the average memory access
penalty, for programs that show a high degree of locality in their addressing
patterns
Caching strategies failed on data intensive programs like large matrix operations
On Demand Memory Fetch Policy
This policy fetches data into the cache from main memory only after the
processor has requested a word and found it absent from the cache
Compulsory Miss (Cold Start)
On Demand memory fetch policy will always result in a cache miss for the first
access to a cache block, since only previously accessed data are possible
available in cache.
Such cache miss are known as Cold Start or Compulsory Miss
Capacity Miss
If the referenced data is part of a large Array operation, it’s likely that the data
will be replaced after its use to make room for new array elements being
streamed into cache.
When the same data block is needed later, the processor must again bring it in
from main memory incurring the full main memory access latency.
This is called Capacity Miss
Data Prefetch Operation
Many of these cache misses can be avoided if we add a data prefetch operation.
Data prefetching anticipates such misses and issues a fetch to the memory
system in advance of the actual memory reference.
Prefetch proceeds in parallel with processor computation, allowing the memory
system time to transfer the desired data from main memory to the cache.
Ideally, prefetch will complete just in time for the processor to access the
needed data in the cache without stalling the processor.
Continued…
Latency of main memory access is hidden by overlapping computation with
memory accesses resulting in a reduction in overall time in Figure 2b.
Figure 2b represents the ideal case when the prefetched data arrives just as it is
requested by the processor.
Less optimistic situation is depicted in Figure 2c.
Figure 2c
Prefetches for references r1 and r2 are issues too late to avoid processor stalls
Still received some benefit in fetching r2
R3 has arrived early enough to hide all of memory latency, but must be held in
processor cache for some period of time before it can be used.
During this time, the prefetched data is exposed to Cache Replacement Policy
and may be evicted from cache before use.
When this occurs, prefetch is said to be useless because no performance benefit
derived from early block fetching.
Cache Pollution
A prematurely prefetched data may also displace data in the cache that is
currently in use by the processor, resulting in Cache Pollution.

If a prefetched block that displaces a block which is required after the


prefetched block, is seen as ordinary replacement miss, and not a Cache
Pollution.
Prefetching continued
By removing processor stall cycles, prefetching increases the frequency of
memory requests issued by the processor.
Bandwidth can be saturated and forecasted benefits can be nullified.
Memory systems must be designed to match the higher bandwidth needs.
Prefetching continued
Software Prefetching

Hardware Prefetching
Software Prefetching
Prefetch instruction needs to be added sufficient and necessary clock cycles
before the data is required.
Software Prefetching continued
Adds instructions into the execution stream
Actually increase the amount of work done by the processor
Hardware Prefetching
Many hardware based prefetching techniques have been proposed, which do
not require the use of explicit fetch instructions.
Employ special hardware, which monitors the processor in an attempt to infer
prefetching opportunities.
Incurs no instruction overhead.
Often generates more unnecessary prefetching than software prefetching,
because they speculate on future memory accesses without the benefit of
compile-time information.
Unnecessary loading of blocks can result in cache pollution.
Prefetching
Must be implemented in such a way that prefetches are timely, useful, and
introduce little or no overhead.
Prefetching strategies are diverse and no single strategy has been proposed
which provides optimal performance.
References
Data prefetch mechanisms,
Steven P. Vanderwiel, David J. Lilja,
ACM Computing Surveys, Vol. 32 , Issue 2
(June 2000)

You might also like