Performance aspects of high-bandwidth multi-lateral cache organizations.
Rivers, Jude A.
1998
Abstract
As the issue widths of processors continue to increase, efficient data supply will become ever more critical. Unfortunately, with processor speeds increasing faster than memory speeds, supplying data efficiently will continue to be more and more difficult. Attempts to address this issue have included reducing the effective latency of memory accesses and increasing the available access bandwidth to the primary data cache. However, each of these two techniques is often proposed and evaluated in the absence of the other. This dissertation proposes and evaluates solutions for the latency and the bandwidth aspects of data supply, and a cache structure that incorporates both solutions. To solve the latency problem, we use the multi-lateral caching paradigm for active data placement and management in the L1 cache. The multi-lateral paradigm, which advocates the partitioning of the L1 cache into multiple subcaches, emerges from the fundamental belief that several classes of data elements have distinct reference or behavior characteristics that should be exploited differentially. However, some criterion must be found upon which to partition the reference stream into appropriate classes for selective caching. We demonstrate the value of temporality-based caching as an appropriate criterion with the introduction of the Non-Temporal Streaming (NTS) Cache, which performs better than larger single-structure caches. The NTS Cache utilizes data reuse information for intelligent data placement and active management of its 2-unit multi-lateral structure. To solve the bandwidth problem, we analyze the scalability of current multiporting approaches as data access parallelism increases. Currently, multibanking and multiporting through replication are the two popular and implementable approaches to providing multiple ports. However, neither approach appears to be scalable with increasing data access parallelism. Whereas the multibanking technique suffers performance degradation because of bank conflicts, the replication technique is die area limited and degraded by the need to broadcast stores for coherence. Multibanking is economical in terms of die area requirements and design complexity. Analysis of the SPEC95 memory reference streams reveals that the majority of all bank conflicts are due to nearby references that map into the same cache line of the same cache bank. Our solution to the bank conflict problem is the Locality-Based Interleaved Cache (LBIC), which is built on traditional multibanking, but exploits same line spatial locality to obtain performance similar to true multiporting at far less cost. Finally, this dissertation explores the effectiveness of reducing the average data access time via active management of a cache space in conjunction with high bandwidth techniques. Experimental results show that adding multi-lateral caching to an LBIC results in a cache structure capable of performing as well or better than traditionally managed single-structure LBIC caches of nearly twice the size.Subjects
Aspects Bandwidth Cache High Lateral Management Multi Organizations Performance
Types
Thesis
Metadata
Show full item recordCollections
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.