|
Various vendors have made claims regarding their memory systems,
but the topic of memory bandwidth lends itself to some confusion
and must be treated with care. This page provides an introduction.
What is memory bandwidth?
A memory bandwidth statistic describes the amount of traffic, measured
in megabytes per second, that a computer system can move from one
level of memory to another. There are, however, many different ways
to report such measurements. For example:
- the bandwidth from main memory to the board-level cache
- bandwidth from cache to CPU
- bandwidth all the way from main memory to the CPU
- bandwidth in the reverse direction
- or bandwidth for round trips
- for single processors or
- for SMP systems
- "raw" bandwidth including that used by bus protocols
- "usable" bandwidth after discounting protocol overhead
Why would you want to have more memory bandwidth?
Simply put, CPUs have improved at a more rapid pace than memory
systems, thus increasing the frequency of workloads where performance
is limited by the speed of the memory system. Intel CEO Andrew S.
Grove has described bandwidth bottlenecks as "valleys of death."
Various application areas are affected:
- Business Week (Supermath for the Real World, September
5, 1994) discusses the use of nonlinear mathematics, which generally
stresses a memory system, for car crashes, stock portfolios, steel
furnaces, truck schedules, and ATM networks.
- IBM suggests ("Power2 Performance on Engineering/Scientific
Applications," IBM Risc System/6000 Technology Volume
II) that bandwidth is important to computational structural
mechanics, heat transfer, car crashes, weather prediction, computational
fluid dynamics, and seismic applications.
- AltaVista finds 3,535
web pages that reference "memory bandwidth." Clearly,
the topic is of widespread interest.
What's wrong with using memory bandwidth to position systems?
There are four problems:
- It is difficult to know when an application needs more memory
bandwidth. You can use simple commands such as the UNIX csh "time"
command to find out about CPU consumption, IOs, and paging; but
there is no simple command that will tell you when an application
needs more memory bandwidth.
- Vendor claims about memory bandwidth may not be comparable,
because there are so many different ways of measuring it (see
above, What is Memory Bandwidth?).
- Memory bandwidth is sometimes a contributing factor to overall
balanced system performance, but it is far from the only factor.
- Often a vendor will state a "raw" statistic that real
applications are unlikely to achieve.
Can the confusion around memory bandwidth be reduced?
For over 7 years, John McCalpin, formerly of the University of
Delaware (now at SGI), has kept records of various computer systems'
memory performance as measured by FORTRAN or C programs running
4 simple loop kernels:
| Copy |
|
c(j) = a(j) |
| Scale |
|
b(j) = 3.0 * c(j) |
| Sum |
|
c(j) = a(j) + b(j) |
| Triad |
|
a(j) = b(j) + 3.0 * c(j) |
The benchmark, known as "Streams," credits the system under
test with 8 bytes read and 8 bytes written each time through the first
or second loop, for a total of 16 bytes each; and 24 bytes for each
time through the third and fourth loops. By measuring these four simple
kernels, the benchmark measures traffic all the way from registers
to main memory (and vice versa) -- the arrays are much too large to
fit in caches. It measures a mixture of both read and write traffic.
By using FORTRAN or C, it measures "programmer-perceived"
bandwidth rather than raw bandwidth.
Because so many different systems have been measured using this
benchmark, it provides a clear competitive comparison.
John McCalpin's archives of Stream data are hosted at the University
of Virginia: http://www.cs.virginia.edu/stream/.
Rule of thumb
What if a vendor publishes "raw" bandwidth and does not
publish programmer-available (Streams) bandwidth? Is there a conversion
factor or rule of thumb?
Yes. Take the vendor's claimed bandwidth and multiply by 0.5.
If the vendor protests that this is unfair, challenge the vendor
to prove it by running the Streams benchmark and by submitting the
results.
Note: A factor of 0.10 may be more appropriate if the vendor
marketing literature lacks precise information.
Memory bandwidth hints and tips
The best configuration for memory bandwidth is system dependent.
Typically, though, if you see a slot where memory could be plugged
in, you want to fill it. If you can fill all slots with the same
size memory units, usually that helps too. Interleaving
is your friend if your application needs bandwidth, and many
systems do their best job interleaving when all slots are filled
with like-sized units.
John Henning
High Performance Systems Benchmark Performance Engineering
Compaq Computer Corporation
Revised 9 January 1999 |