Professional Documents
Culture Documents
333
Normalized Ratk of Total instructionsbytes
consuned
2 00
1 50
100
0 50
0.00
gcc1.35 espresso li eqntotl
Fig. 3. Occurrence of Memory Operations
Fig. 2. Normalized Instruction Volume (Total
number of instructions AV. instruction length). Fig. 4. shows the relative occurrence of memory accesses
per insmction in other architectures. The metric
Fig. 2. shows the total instruction volume in bytes presented is the total number of memory bytes accessed
consumed in the execution of a program for the i386 for read and write, normalized w.r.t. the total accessed
architecture and SPARC or MIPS architectures. This (read+write) for the i386 architecture. The data for the
metric shows more disparity between the i386 SPARC and MIPS architectures is obtained from [2].It is
architecture and the other architectures due to the fact seen that the i386 architecture has greater data memory
that the average instruction length is only 71% of that for traffic than SPARC or MIPS architectures. This tends to
the other CPU architectures. The total instruction volume balance out the lower demand the i386 architecture
is arrived at by multiplying the average instruction length imposes on instruction memory traffic (see. Fig. 2).
by the total number of instructionsexecuted The number Similar data is shown by HeMessy and Patterson [9],
is normalized with respect to the i386 architecture. comparing the VAX architecture and a reference
Load/Store architecture.
2.2 Memory references per Instruction
Normallzed Total Memory traffic (mt I386
The i386 architecture allows memory accesses by total memory tremc)
I
several instructions, and thus the number of memory
readdwrites can't be simply determined by looking at the
number of loadstore instructions as in the case of
processors with Load/Store arthitectures. The breakdown
of the average memory references per instruction shows
the memory bandwidth reQuirements of each program.
Excessive memory traffic causes a degradation of the
prefetch hit-rate also, since the Intel486 CPU has a
unified first level cache.
I 1386 SPARC MIPS
334
has better than average cache hit-rates, instruction
selection is the key factor in determining CPI.
-
Gcc Intel486 Data Hltrato
GCC:
Fig. 5,shows the CPI during the execution of one
II
XLlSP Intel486 CPI
sample in Gcc increasing dramatically to 3.7. Fig. 6.
shows the Data Hit-rate in the unified 8K cache during 2.51
the same set of samples. It can be seen from the circled - 2
area of Fig. 5., that the CPI rises during the execution of
sample 40, which at the same time experiences a dramatic
drop in data hit-rates, as Seen in the circled area in Fig. 6.
Examination of the detailed simulation logs for that
sample reveal the following facts: (i) the Write Hit-rate
during that sample is around 37%, (ii) writes outnumber
reads by a ratio of 2 to 1, and (iii) 40% of all memory I
accesses are writes. Since the Intel486 DX CPU executes Fig. 7. CPI profile of Xlisp
in Write-through mode, the presence of several back to
back writes will overflow the write buffers and cause the
processor to stall. The average CPI for Gcc is 2.27.
100 -
90 .~
335
compare), causes the CPI to rise. In comparison, the
Espresso/Bca SPARC has no mul/div instructions and coincidentally
has a greater path length. This shows the
The circled area in Fig. 9. shows an increased CPI inappropriateness of using CPI as a comparison metric
(2.45) due to experiencing a low cache hit-rate - data hit- across architectures.
rate is 73%. The cache hit-rate for reads in that sample is
even lower, only 67.5%, which explains the higher CPI.
The average CPI is 1.91, with a low of 1.4 and a high of Eqntott Intel486 CPI
2.5. This is the case where the average CPI is quite 2.5
misleading as it varies from sample to sample.
2 --- n
2.5
'I
Eqntott Intel486 Data Hit-rate
100
Fig. 9. CPI Profile for Espresso / Bca 90
80
-
BCNEspresso Intel486 Data Hkrate
100
90
80
70
336
and thus improving the overall performance. The CPI for SPEant lor Inbl&DX 8nd o h n
the four integer SPEC benchmarks vary from a low of 1.4
to a high of 3.7 for the Intel486 DX microprocessor. This
indicates the danger of comparing average CPI, which
"T H
may have fluctuated over a wide range during the run
time of the job. Also even within the i386 architecture,
CPI does not correlate well to performance across various
applications. The CPI variations are being studied to
improve the performance of the overall job.
4. REFERENCES
Crawford, J. and Gelsinger, P., "Programming the Fig. A l . Comparison of SPECint for several
80386, Sybex, San Francisco, CA, 1987. systems
Cmelik, Robert E, Kong, Shing I., et.al, "An
Analysis of MIPS and SPARC Instruction Set The results for the IBM RS6000, SparcStation 2 are
Utilization on the SPEC Benchmarks", Proceedings from the SPEC newsletter dated September 1991[4]. The
of ASPLOS-IV, Santa Clara, Califomia, 1991, pp. MIPS RC3360 result is from the Winter 1991 publication
290-302. of the SPEC newsletter. The Intel486 DX microprocessor
Stephens, Chriss., Cogswell, Bryce., et.al, result is from the 50- Inte1486DX Microprocessor
"Instruction Level Profiling and Evaluation of the performance brief published by Intel Corporation[6].
IBM RS/6000", ' gs of the 18th Annual
APPENDIX
337