You are on page 1of 7

Turbo Boost and Overclocking © Intel Corp.

Architecture and Early Performance Results of Turbo Boost Technology


on Intel® CoreTM i7 Processor and Intel Xeon® Processor 5500 Series (2009)
Markus Mattwandel, Todd Baird, Jorge Garcia, Seongwoo Kim, Herbert Mayer *

Abstract We survey the Turbo Boost Technology on the new Intel® Core TM i7 multi-core, multi-threaded micro processor. Turbo Boost
Technology dynamically increases the frequency of processor cores for the benefit of higher performance, while operating under thermal
design limits and maintaining safe conditions on the physical chip. This paper outlines the degree, how much the core frequency can be
raised as a function of the number of currently active cores and of other electrical and temperature parameters. We explain conditions,
under which such boosts are possible, depending on instantaneously flowing current, on overall power consumption with resulting heat
generation, and on actual temperature of the core[s] being boosted. We contrast Turbo with Overclocking, another method of boosting
frequency and improving performance, and discuss the pros and cons of Turbo versus thermal throttling. Since the Turbo Boost
Technology has been implemented in silicon on the Core i7, on both single-socket desktop and dual-socket servers, we include actual
performance data from average to ideal cases. Core i7 is implemented in 45 nm High-K Silicon, launched in late 2008 as a High End
Desktop platform with 1, and in 2009 as a server with 2 processors, each having 4 cores and 2 hardware threads per core. We conclude
with conjectures into the future and a list of references.

Keywords: Multi-Core; Turbo Mode; Overclocking; Simultaneous Multi-Threading (SMT); Parallel System; Logical Core; Green Computing

1
Turbo Boost and Overclocking © Intel Corp.
1. Introduction dangers of Turbo. defined envelope of i7, Intel’s previous
Since the Turbo Boost power, current, and generation Core 2 Duo
Turbo Boost
Technology has been temperature, the processors introduced
Technology (Turbo, for
implemented in silicon specification limits. the 1st generation of
short) dynamically
on the Core i7, on both The amount of Turbo technologies
enables a temporary
single-socket desktops additional frequency known as Enhanced
performance boost on
and dual-socket upside each core Dynamic Acceleration
the new Intel® CoreM
servers, Section 5 actually will achieve Technology (EDAT).
i7 multi-core, multi-
includes detailed, depends on the total This technology allows
threaded micro
actual performance number of active processor cores to
processor, stylized in
data on client- and cores, executing automatically run
Figure 2.1. Turbo
server platforms, from processes (threads) faster than their base
Boost Technology
average to ideal cases. that a workload has operating frequency, if
increases the core
Section 6 contrasts spawned, and on the one or more core(s) are
clock of a processor in
Turbo with other thermal operating idle. In that event, the
defined, discreet
performance boost environment, which operating frequency of
frequency steps (AKA
ideas, while sections 7 includes current the other cores is
bins) for the benefit of
and 8 conclude with a (thermal design increased. Note that
higher performance,
conjecture into the current, or TDC) and this increase is
while conditions on the
future and references. power consumption influenced by the
physical chip allow
The physical (thermal design power, number of active
this without
Core i7 or TDP), as well as hardware threads and
endangering the
microprocessor is temperature. Turbo by various electrical
microprocessor. This
realized by Intel in 45 Boost kicks in when and thermal
survey outlines the
nm High-K Silicon the OS power scheme parameters, before
degree, how much the
technology, launched is set for performance taking advantage of a
core frequency can be
in late 2008 as a High- and the processor clock boost within the
raised as a function of
End Desktop platform package is operating product constraints.
the number of active
with a single socket, below critical Turbo Boost and
cores and of other
and in 2009 as a server constraints. The core EDAT also happen to
parameters.
with 2 sockets. frequency is be “Green”
Section 2
 Correspondin
dynamically adjusted technologies that
describes the design
g author: within the defined provide performance
goals of Turbo Boost herb.g.mayer limits, as the operating on demand, while
Technology on Core i7 @intel.com conditions change. keeping power
and contrasts the new  SPEC, SPECint Frequency & consumption at a
method with an older and SPECfp Voltage C C Cwhen the
C
are copyright of minimum
Turbo legacy method Independent O O Oprocessor
O
SPEC Interface additional
implemented on earlier

CO
R R R is notR

R
E
S
performance
Intel silicon. It E needed, asEjudged Eby
E
discusses the pros and 0 1 2 3
DRAMs the current load.
cons of Turbo vs. 2. Description of
Overclocking, both of DDR3 Last Level CacheBoostPwr
2.2. Turbo vs.
Turbo Boost
them being methods of Overclocking &
UN

Technology
O
C

R
E
IMC QPI QPI Clk
boosting frequency to Turbo is quite distinct
Why Turbo Boost?
increase performance, from QP
overclocking.
Intel Turbo Boost
yet with different goals First ofI all,
Technology,
and conditions. It also overclocking increases
introduced on Intel’s
compares Turbo with Figure 2.1 High-Level clock frequency by
flagship Core i7 and
thermal throttling. Nehalem running outside the
Core i7 Extreme
Section 3 summarizes, Architecture specification of the
Edition processors in
how much Turbo part, while Turbo
Q3’2008, allows 2.1. Turbo Boost
boosting is operates completely
processor cores to Technology vs.
theoretically possible, within spec. Turbo
automatically run Enhanced Dynamic
as set by predefined does not change the
faster than their base Acceleration
system parameters. In reliability or durability
operating frequency if Technology: Prior to
section 4 we list costs, of a part.
cores are operating at the introduction of
shortcomings, and Overclocking occurs
the low end of a Turbo Boost in Core

2
Turbo Boost and Overclocking © Intel Corp.
when the clock rate of possible damage, but current core’s Similarly, as the
the processor is also saving power. temperature, the sample 1-1-4-8 bound
manually and statically Turbo Boost overall current and shows, other cores
increased. This results Starting Clock momentary power, and
Low, to run safely may become active,
in running the heat protection Yes the number of active forcing a current high
processor out of its Protective action cores. Each frequency
Decrease clock boost rate to decrease,
specified and thus safe Arch. driven Yes step of turbo boots is again to protect the
limits. Conversely, Thermal Throttling 133.33 MHz. For each processor and save
Turbo technologies run comes from the other SKU, fuse values are power.
the processor within end by taking a greedy set in a small internal
4. Technology
specification, and aim approach of table during chip
Investment for Turbo
to take advantage of performance manufacturing, to
Boost
optional thermal enhancement. define an upper bound,
headroom available Thermal throttling how many of these Although the goal of
during under-utilized assumes that the frequency steps improving
conditions. microprocessor is maximally a core can performance with Intel
Overclocking is not a generally running in increase safely. The Turbo Boost
“Green” technology, some steady state of table parameters d-c- Technology is worth
since it forces execution, but b-a mean: If 1 core is pursuing, the long-
increased processor acknowledges that active, that core’s term investments and
power consumption temporary hot spots frequency may shorter-term costs
continuously without are possible. This increase by a bins. must be weighed
regard to actual happens when the Else if 2 cores are against gains on the
demand. typical mix of IO- active, these cores can performance side for
bound plus compute- grow by b frequency the user and the
Turbo Boost
bound execution is steps, etc. business side for the
Starting Clock Base op frequency manufacturer.
heat protection Yes replaced by compute- Applying the
Protective action bound
Decrease clock only execution, same encoding 4.1. Engineering
resulting in more heat principle, but starting Investment
Application generation than is safe.
Automatic, based from he other end, the
Mechanism on sys. Conditions The up front
Similar to the safety table entry 1-1-4-8
engineering costs to
2.3. Turbo Execution action taken in Turbo, means that for 3 or 4
design and implement
vs. Thermal the frequency is cores being active, the
the Turbo Boost
Throttling throttled in thermal frequency may
Technology were
throttling, resulting is increase by just 1
Turbo Boost noticeable but
less current and thus frequency step. But if
Technology is a contained despite the
less heat being only 2 cores are busy,
conservative existence of past
generated, and less the speed may grow up
performance technological history
performance being to 4 steps, and if only a
enhancement method at Intel; e.g. the
delivered. single core is active,
that increases the clock Enhanced Speed Step
A microprocessor the current one may
rate, after the Technology. Design
architect must decide, grow by 8 frequency
microprocessor costs included a new
which safe technology steps, amounting to
recognizes that an mini-controller, called
of performance 1.06 GHz incremental
increase in clock speed the Power Control Unit
boosting should be clock speed.
is safe; it is understood (PCU), and associated
realized in Silicon, However, this
that the processor was microcode. Also, the
one, or the other, or boost may decrease, if
already operation in a cost of validation was
both. On the Core i7 for any reason a
safe way before significant because
Intel decided to predefined envelope of
boosting the clock new methods were
provide both methods. maximally allowable
speed. When the developed to ensure
current or temperature
thermal parameters 3. Ideal Performance that the feature was
is exceeded. Decrease
change, or when the Speedup with Turbo working properly
is designed to not only
number of active cores A number of dynamic without interfering
save the
increases, then the parameters dictate the with the operation of
microprocessor from
prior clock increase is upper limit of Turbo the feature. The
thermal stress, but to
reversed, not only Boost speedup limit. manufacturing flow
save power and run
saving the chip from These include the was also updated to
“more green”.
support testing of the

3
Turbo Boost and Overclocking © Intel Corp.
PCU, which added concentrated further on application to run on Turbo upside when
another minor single and multi- hardware thread 0, 2, affinity is set, as it can
development cost. threaded workloads in or 3. run on one single core
our focus on turbo for the whole test
4.2. End User Costs
performance duration. Rendering
When Turbo Boost measurements. We software performance
Technology promotes proceeded by running data show Turbo to
cores to a higher three baseline have a positive result;
frequency, the frequencies without rendering is
processor will draw enabling turbo. The conventionally
more current than it base frequencies were calculated in time
would while running at 2.66 GHz, 2.8 GHz, units, hence smaller is
nominal frequency. and 2.93 GHz to better.
The end user will incur simulate the lower and
an incremental cost for upper bounds of the
additional electrical workload.
power consumed in Initial results
this mode, however showed mix results
this cost is very minor because the OS
compared to the power scheduler was
used by the system as a allowing single-core
whole. If necessary, workloads to run on
users may choose to multiple CPUs. By
manually adjust the setting affinity Figure 5.1 Cinebench
balance between manually, and forcing 10
performance and workloads to run on a
power consumption single CPU we were Allowing single
through the OS power able to obtain threaded workloads to
policies. maximum benefit from run on any hardware Figure 5.3 Rendering
A final Turbo. Affinity here thread incurs Workloads
theoretical cost to note means to associate any performance penalties
is the introduction of a particular thread with a because each time a Figure 5.3 shows
variable frequency dedicated core or thread moves around 3DStudioMax and
processor into an hyper-thread. The the OS needs to MainConcept H.264
environment that has learning of setting SAVE/RESTORE state reaching nearly ideal
largely been able to affinity manually was to preserve Turbo speedup because
depend on a constant then applied to all determinism. these workloads are
processor frequency. single-threaded CPU centric, can run
Some applications may workloads. on specific cores, and
attempt to synchronize incur no other
events in time based 5.1. Turbo Speedup overhead.
on the assumption that on UP Client
frequency does not Setting processor
change over time, affinity is the process
although none has yet by which an
been found by Intel. application manually
Computer users may tells the OS scheduler
also become alarmed where to run, in other
when their frequency words, it restricts the
reporting tools begin to available hardware
show dynamic threads where the
frequency changes. workload may run.
For instance, setting
5. Actual Figure 5.2 Cinebench
Affinity = p3, tells the
Performance Data 9.5
OS scheduler to only
with Turbo Boost
run on Processor 3. Figure 5.4 Arithmetic
We isolated workloads Setting Affinity = P0, Figures 5.1 and 5.2 and Multi-Media
known to be CPU- P2, P3, allows an show Cinebench Workloads
centric, and obtaining highest

4
Turbo Boost and Overclocking © Intel Corp.
individual SPEC CPU performance against
Figure 5.4 exhibits 2000 scores are based maximum performance
Sandra measurements on measurements on without the constraints.
of Arithmetic and Intel internal This unconstrained
Multimedia application development platforms case would give the
with multi-threaded and may differ from same performance as
workloads. measurements on the case of
production platforms overclocking the
available later in 2009. processors to the
For more information Turbo frequency in
about the benchmarks non-Turbo mode, e.g.,
see [4]. Table 5.1 3.20 GHz for multi-
Experimental core active workload,
5.2 Turbo Speedup on
Setup unless there is some
DP Server
overhead caused by the
Table 5.1 summarizes Turbo. Note that we
our setup for DP Turbo used maximum scores
experiments. We used among several samples
an engineering of each experiment in
validation board, the comparison. The
called Green City with system could
an open bench top occasionally generate
configuration. This is exceptionally low
Figure 5.5 Estimated certainly a different scores due to certain
Individual SPEC thermal system abnormal transient
CPU 2000 Score, 4- condition compared to conditions at the
Users a typical end-user beginning of tests.
environment in a Based on our previous
standard chassis. experience, we believe
However, we learned this water-marking
that thermal impact on approach in sampling
Turbo performance is is effective when
still second-order Figure 5.7 NHM- dealing with a pre-
based on pre-Si study EP System with production platform
and other post-Si external fans prior to fine tuning.
experiments Figure 5.5 presents the
conducted. As shown We first tested the Turbo speedup for 16-
in Figure 5.4, each SPEC CPU2000 user SPECintRate
processor has an benchmark compiled along with the level of
individual heatsink with Intel Compiler run-to-run variation.
Figure 5.6 Estimated with active fans 11.0 for multiple cases The red line indicates
Individual SPEC attached. In addition, of our interest. The the amount of
CPU2000 Score, 4- four external fans are baseline configuration frequency increase
Users placed on the side to was to turn off the between P1 and P0,
cool down the Turbo mode and the i.e., slightly more than
Figures 5.5 and 5.6 memories, voltage benchmark scores 9%. On average,
display various regulators, etc. All were compared with Turbo mode brings
components of fans were running at a the cases of Turbo about 5.8%
CPU2000 visibly constant speed. If the mode. Since Turbo is performance upside,
benefiting from Turbo. workload does not hit designed to operate compared to non-
These workloads Turbo constraints, the within predetermined Turbo.
represent a gamut of Core frequency can TDC, TDP, and This extra boost
diverse disciplines and increase up to 3.33 thermal constraints, it is still within the
do not all scale linearly GHz dynamically may not always run at thermal design
with core frequency. depending on the maximum Turbo envelope. For
Thus, some workloads number of active frequency. In order to example, bzip2 and
do not reach full cores. assess the efficiency, gcc reach the ideal
theoretical Turbo we compared actual Turbo performance
benefit. Estimated

5
Turbo Boost and Overclocking © Intel Corp.
target. On the other integer suite. One for SPEC CPU Figure 5.10
hand, multiple of the reasons is 2000 Floating- Performance
components are below that some point Rate impact with
the ideal level. Our floating-point IC11.0 – 16-user. Additional
analysis using components do Power
workload profile from not rely on Since we Headroom
the power control unit activity that observed TDP is
showed that these scales with the only limiter in We also evaluated
workloads hit TDP frequency, e.g. our test on this SPEC JBB 2005
limit. The variation is DRAM accesses. platform with 95W benchmark. As
represented by Even though processors, one shown in Figure
standard deviation some components may wonder how 5.11 we compared
over the mean for 9 directly take much the impact of the
different trials of each advantage of performance simultaneous
benchmark faster core clock, improvement can multi-threading
component. The e.g., sixtrack, they be obtained by a (SMT) under Turbo
variability is mainly often hit TPD bit more power
due to sub-optimal mode. The Turbo
constraint headroom via
memory usage by OS provides the
throughout the process
under non-uniform upside with and
execution. Some enhancement or
memory configurations without SMT while
bars look other system
(NUMA) between the the best
erroneous in implementation
two separate processor performance is
terms of basic factors, e.g.,
(two separate sockets achievable with
relationship. voltage regulator
on a server platform) SMT and Turbo for
However, run-to- accuracy. To
and it is the main this workload. In
run variation has address this
explanation for the this case, the
to be factored in question, we
case where two ideal benchmark rarely
to explain. experimented
performance bars hit TDP, which is
Although we with additional
mismatch. indicated by the
present the cases by unconstrained
variation only for artificially Turbo case.
Turbo case here, adjusting the TDP
its level was not to higher limits.
dramatically Figure 5.7
different in non- illustrates the
Turbo cases. performance
There is no impact of 4W and
empirical 8W additional TDP
evidence so far budget for
suggesting that selected
Turbo introduces benchmark
additional run-to- components,
Figure 5.8 Turbo
run variation on a which are core-
Speedup for
given system. bound and power Figure 5.11
SPEC CPU 2000
constrained. It is clear Turbo
Integer Rate
that performance gain Performance for
IC11.0 – 16-user
is measurable for these SPEC JBB 2005
workloads.
Figure 5.9 is the The results of
observed speedup JBB2005 are not to be
for SPECfpRate. interpreted as official
The average results by Intel, and
performance instead are being
benefit by Turbo is presented as we found
3.3% out of a them in 2008 on our
3.5% goal, which development platform,
is less than Figure 5.9.
in line with section 5.0
observed in the Turbo speedup
of the JBB run rules.

6
Turbo Boost and Overclocking © Intel Corp.
6. Related Whether such future
Work cores shall have sibling
hyper-threads, or
If “EIST (Enhanced whether the architects
Intel SpeedStep shall use those same
Technology)” --see transistors instead for
Jeff Reilly comment – even more cores
is different from remains to be seen.
EDAT, explain and 8. References and
provide reference. Referees
7. Conclusion and a We wish to thanks the
Look Ahead anonymous reviewers
In this survey we … and our colleagues
provided a high-level at Intel, Ronak Singhal
explanation of the and Jeff Reilly, who
Turbo Boost suggested crucial
mechanism on Core i7, improvements and
how it can accelerate contributed
some applications, but clarifications.
how the degree of [1] 2008 November, Intel
speedup is dependent White Paper,
on activity-factors of http://download.intel.com/de
sign/processor/applnots/320
cores on the same 354.pdf?iid=tech_tb+paper
physical processor, and “Intel® Turbo Boost
dependent on electrical Technology in Intel Core™
Microarchitecture (Nehalem)
and thermal
Based Processors.”
conditions. We
compared Turbo, a [2] 2008 November 8, POD
built-in, dynamic, Tech website
http://www.podtech.net/hom
automatic boosting e/search/Turbo+Boost+Tech
feature with nology “Turbo Boost
overclocking, initiated Technology”
by the end user at the
[3] 2008 November 3, Intel
user’s own risk. We website
also contrast Turbo, http://download.intel.com/pr
which will speed up essroom/kits/corei7/pdf/Intel
the clock from its %C2%AE%20Core
%E2%84%A2%20i7_Overv
standard rate, with iew.pdf “Intel® Core™ i7
thermal throttling, Microprocessors, The Best
which slows down Processor on The Planet”
clock rates from the
[4] General SPEC website
standard rate for the http://www.spec.org
sake of component
protection. [5] 2006 August, SPEC
Whether future website for integer
component of SPEC
processors, which will CPU2006:
exceed the 1 billion http://www.spec.org/cpu200
transistors per part, 6/CINT2006/
continue to provide
[6] 2003 October, SPEC
both Turbo boost and website for floating point
thermal throttling, component of SPEC
remains to be seen. CPU2000:
But it will be a natural http://www.spec.org/cpu200
0/CFP2000/
evolutionary step to let
the number of cores [7] Intel® Turbo Boost
grow beyond the 4 in technology,
the current Core i7. http://www.intel.com/techno
logy/turboboost/

You might also like