Study on the Performance Impact and Implementation of Hardware Prefetch Technology in memory Subsystems

“Memory wall” has always been an important bottleneck restricting the further improvement of the processor performance. In order to resolve this limitation, the hardware pre-fetch technology is introduced into the processor as an important supplement to the multi-level Cache technology. As the two most commonly used data prefetch technologies, Stride prefetch and page prefetch have different implementation characteristics and performance influence. In this paper, different prefetch designs are implemented on the FPGA platform. By loading the operating system, running the real tests set(SPEC2006), and further analyzing the performance impact of different hardware prefetch mechanisms on different programs, by analyzing the memory characteristics and performance changes of related test questions.Experiments showed that the improvement achieved by the Stride prefetcher and the page prefetcher fusion was the most obvious, with 22.18% and 33.42% improvement in the L1D and L2 Cache Miss rates, respectively. However, at the same time, due to the differences in the access characteristics of the program, the pre-fetch mechanism brings significant execution delay to some programs, and the execution delay of some questions deterior. It shows that only by realizing the real adaptation of the data prefetch function and the overall function can the hidden effect of the data prefetch on the memory latency be fully realized to maximize the processor performance.