Title:
A HyperTransport-Enabled Global Memory Model For Improved Memory Efficiency
A HyperTransport-Enabled Global Memory Model For Improved Memory Efficiency
Author(s)
Young, Jeffrey
Yalamanchili, Sudhakar
Silla, Federico
Duato, José
Yalamanchili, Sudhakar
Silla, Federico
Duato, José
Advisor(s)
Editor(s)
Collections
Supplementary to
Permanent Link
Abstract
Modern and emerging data centers are presenting
unprecedented demands in terms of cost and energy consumption,
far outpacing architectural advances related to economies
of scale. Consequently, blade designs exhibit significant cost and
power inefficiencies, particularly in the memory system. For example,
we observe that modern blades are often overprovisioned
to accommodate peak memory demand which rarely occurs
concurrently across blades. With memory often accounting for
20% to 40% of the total system power [1], this approach is
not sustainable. Concurrently, HyperTransport in concert with
new high-bandwidth commodity interconnects can provide low-latency
sharing of memory across blades. This paper provides a
HyperTransport-enabled solution for seamless, efficient sharing
of memory across blades in a data center, leading to significant
power and cost savings.
Specifically, we propose a new global address space model
called the Dynamic Partitioned Global Address Space (DPGAS)
model that extends previous concepts for Non-Uniform
Memory Access (NUMA) and partitioned global address spaces
(PGAS). The DPGAS model relies on HyperTransport’s low-latency
characteristics to enable new techniques for efficient
sharing of memory across data center blades. This paper presents
the DPGAS model, describes HyperTransport-based hardware
support for the model, and assesses this model’s power and cost
impact on memory intensive applications. Overall, we find that
cost savings can range from 4% to 26% with power reductions
ranging from 2% to 25% across a variety of fixed application
configurations using server consolidation and memory throttling.
The HyperTransport implementation enables these savings with
an additional node latency cost of 1,690 ns latency per remote
64 byte cache line access across the blade-to-blade interconnect.
Sponsor
Date Issued
2008
Extent
Resource Type
Text
Resource Subtype
Technical Report