Professional Documents
Culture Documents
Computer performance have increased at an amazing rate in recent years, and unfortunately so does
power consumption. An ultimate gaming system equipped with a quadcore processor, two NVIDIA
GeForce 8800 Ultra, 4 sticks of DDR2 memory and a few hard drives can easily consume 200W
without doing anything! To reduce power wastage, a few industry standards have been developed to
make our computers work more efficiently.
In January 1992, Intel and Microsoft developed APM (Advanced Power Management) to manage
power when a computer system is idling. Later in December 1996, the successor of APM – the
Advanced Configuration and Power Interface (ACPI) specification was developed by Compaq,
Microsoft, Intel, Phoenix and Toshiba as the industry openstandard power management interface.
What's the difference? Let's take a look :
Advanced Power Management (APM) Advanced Configuration and Power Interface (ACPI)
• Cheap implementation, but not effective. • Implementation is more costly, but effective.
• Application and driver send control to APM • Application doesn't need to manage power.
driver directly. • Device driver uses ACPI to interface with hardware
• Device power is managed by its own driver. power management.
• Other hardware like CPU is managed by • ACPI is abstract, thus OS and hardware can evolve
APM BIOS. separately.
• Power management state machine is done by • Power management state machine is complex, hence
APM BIOS since it is simple. handle by the operating system.
In this article, I will not go into APM as most PC use ACPI these days.
Sections Topics
ACPI Power Management States The Big Picture
Global System States (GStates) G0 Working States
G1 Sleeping State
G2 Soft Off
G3 Mechanical Off
Summary
Sleeping States (SStates) S1 State
S2 State
S3 State
S4 State
Summary
Device States (DStates) D1 State
D2 State
D3 State
Summary
CPU Power States (CStates) C0 State (Active)
C1 State (Halt)
C2 State (Stop Grant)
C3 State (Deep Sleep)
C4 State (Deeper Sleep)
C5 State
C6 State
CStates In MultiCore Processors
Summary
CPU / Device Performance States (P Introduction
States) PState Lookup Tables
CStates In MultiCore Processors
Single Core
Dual Core
Quad Core (Intel)
Quad Core (AMD)
Other PState Features
Super Low Frequency Mode
Combining CPU Cstate & Pstate
CPU Thermal Monitor
Conclusion Conclusion
ACPI Power Management States
These are the power management states as defined in the ACPI specification. I’ll go
through them briefly but let's get a big picture of it first. The details of each and every
state will be described in the following sections.
Global system states
• The entire system state which is visible to user.
• Divided into 4 states – G0, G1, G2, G3
Sleeping states
• They are the sleeping states resides in global system state G1 (except S5)
• Divided into 5 states – S1, S2, S3, S4, S5
Device Power states
• This power state is usually invisible to user.
• When a device is at ‘on’ state, another one might be in ‘off’ state.
• Divided into 4 states – D0, D1, D2, and D3.
CPU Power states
• Also known as CPU sleep states.
• It is within the global system state G0
• So far divided into 5 states – C0, C1, C2, C3, and C4.
• In future, there will be up to C6 states (Penryn).
CPU / Device Performance states
• CPU / Device power management when it is still active.
• Usually clock speed and voltage varies depend on workload.
• The amount of Pstate is CPU / Device specific
• Eg. Higher clock ratio CPU will have more Pstates than a lower clock ratio CPU
CPU Thermal Monitor
• It throttles the CPU to a lower performance state when temperature exceeds the
threshold.
• In TM1, throttling is done via changing its duty cycle.
• In TM2, throttling is done via changing its clock speed and core voltage (Pstate).
Global System States (GStates)
Global system states apply to the entire system and it is uservisible.
G0 Working State
• This is the power state where computer is able to run applications.
• The computer system as a whole is working, but the peripheral devices and CPU
can change their power states dynamically. For example, a monitor can be turned
off when we are just listening to music.
• When a laptop is running in the maximum battery saving mode, the CPU can be
suspended after idling for some time.
• Power consumption is the highest among all G states.
• Example : When doing light work like surfing and chatting, CPU may run at its
lowest clock speed and CDROM drive can be turned off to save power.
G1 Sleeping State
• Applications cannot run in the G1 state. The computer appears "turned off" to the
user.
• The operating system can switch to the normal (G0) state without rebooting.
• Most of the system context will be saved in memory either RAM or the hard
drive.
• The wake up latency (transition from G1 to G0) varies, depending on the Sstate
selected within G1 state.
• Power consumption is small, and may drop to only a few watts (depending on the
Sstate, of course).
• Example : Switching the computer into "Standby" or "Hibernate" mode in
Windows XP.
G2 Soft Off
• No application and operating system context are retained in the G2 state.
• Basically, the whole is system is turned off, with the exception of the main switch
of the power supply unit.
• At this point, some debug and machine check registers still retain their error
codes, if there are any.
• This information is not really useful for end users, except for motherboards that
have debug LEDs.
• Power consumption is almost zero.
• The operating system needs a reboot, and the wake up latency is long.
• It is still not safe to disassemble any device from the computer system, because
some of them are still powered.
• Example : Choose "Shut Down Computer" in Windows XP, but the main power
switch is not turned off.
G3 Mechanical Off
• The computer is completely turned off by cutting the main power into power
supply unit.
• Only the real time clock is still active, using the builtin battery.
• Power consumption is zero, if we don't take the battery into consideration.
• It takes the longest time to go back into the working state (G0).
• It is safe to disassemble devices from the computer.
Global States Summary
Sleeping states define the computer's ‘sleeping methods’ in the G1 (sleeping) state. In all
the sleeping states (except S0 and S5),
1. The CPU executes no instructions. It is having a good sleep!
2. User applications will not run (duh… the CPU is sleeping!)
3. Some devices sleep partially because they need to generate wake up events.
4. When the system is ‘awakened’, it will continue working from the point before it
slept.
S1 State
• The hardware maintains all its system context.
• The CPU input clock will be stopped, and its caches will be invalidated.
• The system memory goes into its selfrefreshing mode.
• All system clocks are turned off, except the real time clock.
• Power consumption is much lower than G0 working state.
• Wake up latency is low. It takes about 2 seconds to go back to the G0 working
state. The hardware will be responsible for restarting the system clocks.
• Example : “Standby” mode in Windows XP, if the S3 state isn’t supported.
S2 State
• Similar to S1 state the only difference is the CPU power state.
• In S2, the CPU and its caches are powered down, instead of just gating the clock
input and invalidating the caches.
• The S2 wake up latency is slightly longer than S1, but it saves slightly more
power.
S3 State
• The S3 state powers down the CPU, cache, chipset and peripherals, except RAM.
• Some devices necessary to maintain memory context will still run.
• RAM goes into a lowpower, selfrefreshing mode.
• The power consumption is as low as the power requirement of the RAM (at idle
power), plus some necessary onboard devices only.
• The wakeup latency is about 56 seconds.
• Example : "Standby" mode in Windows XP if the S3 state is supported by
hardware.
S4 State
• All devices including system RAM are powered down.
• Only platform settings are maintained, while other settings are stored in a special
partition in the hard drive.
• When successfully switched into the S4 state, the system appears to be turned off
to the user.
• The power consumption is very low (< 3W), as almost everything has turned off.
• We need to go through the BIOS boot sequence again when the computer is
awakened.
• The OS reboot is not required. It will automatically return to where you last left it.
• Example : "Hibernate" mode in Windows XP
Sleeping States Summary
Device states occur in the global system G0 working state. They were defined to enable
device vendors to design ACPIcompliant products, so that operating system that support
ACPI, like Windows XP, can manage the devices. There are four Dstates, but vendors
can choose not to implement all states.
D0 State
• In this state, the device is operating at its full power and full functionality.
• Example : A DVDROM drive in active use.
D1 State
• The device can choose to discard its context.
• However, the bus connected to this device should not do anything to cause the
context loss in the device.
• Power consumption is lower than the D0 state, as some working units in the
device will shut down.
• Example : After idling for some time, the laser in the DVDROM drive will
automatically turn off, but the drive controller will still be active.
D2 State
• It is similar to D1, but the bus is free to some power management, like lower the
current and voltage.
• This can save more power, but it will take a longer time to wake up from the D2
state.
D3 State
• The device in this state can be completely turned off.
• Maximum power saving is achieved.
• Wakeup time is the slowest among all Dstates.
DState Examples
Example 1 Hard Disk Drive Power Management Policy
Example 2 Graphics Card Power Management Policy
* DPMS : Display Power Management Signal, defined by Video Electronics Standard
Association (VESA)
CPU Power States (CStates)
CPU Cstates occur in the global system G0 state. Users may not notice it when they are
using the computer, unless monitoring tools like CPUZ is used to inspect the clock speed
and voltage. Cstate implementations are processorspecific. Mobile processors usually
have more Cstates than desktop processors. For example, the mobile Core 2 Duo
processor (Merom) supports C0 to C4 states, whereas the desktop Core 2 Duo processor
(Conroe) only supports C0 and C1 states.
C0 State (Active)
• This is the CPU's maximum working state, where it is actively accepting
instructions and processing data.
• Power saving is virtually zero, unless the CPU has Pstate power management
enabled.
C1 State (Halt)
• It is simply done by executing the assembly instruction “HLT” (Halt).
• This will stop the instruction pipeline within the CPU from executing any
instructions.
• Wakeup time is ultra fast (only about 10 nano seconds).
• The CPU is able to save up to 70% of its maximum power consumption.
• All modern processors must support this power state.
C2 State (Stop Grant)
• The processor core clock and platform I/O buffers are gated.
• In other words, the clock does not exist in the processor execution engines and
I/O buffers.
• The benefit over C1 is that the C2 state is able to save 70% of the CPU's
maximum power plus some platform power.
• However, the transition time from C2 to C0 is 10 times more (~100 nano
seconds).
C3 State (Deep Sleep)
• The bus clock and PLLs are gated.
• In a multiprocessor system, the processors no longer handle FSB snoops to
maintain cache coherency. Cache contents are invalidated.
• In a singleprocessor system, memory transactions are prohibited but cache
contents are not invalidated.
• CPU still saves around 70% power, but the platform power will be reduced even
more than C2.
• Wake up time is 500 times longer than C2 (about 50 micro seconds).
C4 State (Deeper Sleep)
• It is similar to the C3 state, but with two main differences.
• First, the core voltage is reduced to a very low level (less than 1.0V) to decrease
current leakage.
• Second, data stored in the L2 cache will be reduced bit by bit over time.
• The CPU can save around 98% of its maximum power.
• Wakeup time is slower, but still much lower than 1 second (~160 micro
seconds).
C5 State
• When the data in the L2 cache is reduced to zero.
• Wakeup time is more than 200 micro seconds.
C6 State
• New power management feature in Penryn.
• When the L2 cache contents are shrunk to zero, the CPU will go into an even
lower core voltage.
• CPU context is no longer preserved.
• Power consumption is currently unknown. Should be near zero.
• Wakeup time is currently unknown.
CStates In MultiCore Processors
In a multicore processor, there can be multiple Cstates in each core, but only one
processor Cstate is enabled at one time. The processor Cstate is equal to the highest C
state of any processing core. Let's say the processor Cstate is Cx, and core Cstate is
CCx, the formula for determining the processor Cstate would be :
Cx = max (CCx1, CCx2, CCx3……, CCxn)
Here are some examples :
CPU Power States Summary
CPU / Device Performance States (PStates)
Pstates define the power management state while the CPU / device is within its
executing state, C0 for CPU and D0 for device.
P0 : Minimum pstate, highest power consumption.
P1, P2, P3…. : P1 > P2 > P3 and so on, in terms of power consumption.
Pn : Maximum pstate with the lowest power consumption.
Pstate power management can be seen in modern CPUs and GPUs. It allows them to
control their active power according to the load at any particular moment. The number of
Pstates is implementationspecific.
For instance, lowend GPUs like the NVIDIA GeForce 7300 GS has only one Pstate,
where the 2D and 3D clocks remain constant over time. On the other hand, highend
GPUs like the NVIDIA GeForce 7900 GT has at least two Pstates P0 state for running
at maximum clock and voltage while playing 3D games, and a P1 state for running at the
minimum clock and voltage in 2D mode.
However, Pstate power management in CPU is far more complex. Different CPU
models have their own unique Pstate lookup tables. Take for example, the Pstate lookup
tables for these CPUs.
Core 2 Extreme X6800 Athlon A64 X2 4800+
P Clock P Clock
Clock Voltage Load Clock Voltage Load
States Ratio States Ratio
2.93 1.2875 81100 2.4 81
P0 11x P0 12x 1.35 V
GHz V % GHz 100 %
2.67 1.2500 7180 2.2 6180
P1 10x P1 11x 1.35 V
GHz V % GHz %
2.40 1.2250 5170 2.0 5160
P2 9x P2 10x 1.325 V
GHz V % GHz %
2.13 1.2125 3150 1.8 4150
P3 8x P3 9x 1.30 V
GHz V % GHz %
P4 7x 1.87 1.2000 1130 P4 8x 1.6 1.25 V 3140
GHz V % GHz %
1.60 1.1750 1.4 2130
P5 6x 010 % P5 7x 1.20 V
GHz V GHz %
1.2 1120
P6 6x 1.15 V
GHz %
1.0 010
P7 5x 1.10 V
GHz %
Core 2 Duo E6300 Athlon 64 X2 3600+
P Clock P Clock
Clock Voltage Load Clock Voltage Load
States Ratio States Ratio
1.87 31 1.8 81
P0 7x 1.2500V P0 9x 1.30 V
GHz 100 % GHz 100 %
1.60 0 30 1.6 6180
P1 6x 1.2250V P1 8x 1.25 V
GHz % GHz %
1.4 4160
P2 7x 1.20 V
GHz %
1.2 2140
P3 6x 1.15 V
GHz %
1.0 020
P4 5x 1.10 V
GHz %
As you can see in the examples above, a CPU with higher multiplier has more Pstates
than a CPU with a lower multiplier. The smallest clock ratio supported by Intel is 6x,
while AMD has a minimum multiplier of 4x. However, their Cool n Quiet feature uses 5x
as the minimum clock ratio.
Note :
Different processor stepping may implement a different set of clock ratio / voltage lookup
table. The clock ratio / voltage tables shown above are just examples to describe the
differences in Pstates used in different CPU models, even of the same family.
The CPU load incorporated with various Pstates is one of the ways to utilize the Pstate
power management. Windows XP may transit a processor to P1 but Linux, on the other
hand, may transit a processor into P2 at the same CPU load. It therefore depends on the
operating system implementation.
PStates In MultiCore Processors
Just like processor Cstates, multicore processor Pstates are tricky. Each core can
request different Pstates, but the final Pstate in each core varies. It all depends on the
efficiency of the processor power control unit.
Single Core
The core Pstate is always the same as the processor Pstate.
Dual Core
The processor Pstate is equivalent to the highest Pstate of the two
cores. Since the Core 2 Extreme X6800 clock ratio/voltage table
was shown in previous page, we will use that CPU as an example.
Let's say you are running SuperPI in Core 1, and WinAmp only
using Core 2. Core 1 should be running at its highest working state
(P0), while Core 2 should be running at its lowest working state
(P5).
The final clock speed and vcore will be 2.93 GHz and 1.2875 V, because the Core 2
Extreme only has a single PLL (clock source) and one Vcore (voltage source).
If Core 1 follows the Core 2's Pstate, your SuperPI will be running very slowly. The
AMD Athlon 64 X2 processor also has a single PLL and one Vcore. Hence, the scenario
is exactly the same like that of the Core 2 Extreme X6800.
Quad Core (Intel)
The Kentsfield processor is made up of two Conroe chips placed
side by side. The first Conroe chip (Cores 1 & 2) is referred as Site
1, and the second Conroe chip (Cores 3 & 4) is referred as Site 2.
Each site has its own PLL source but both sites shared the same
Vcore.
Let's say we have a Core 2 Extreme QX6700 with the following clock ratio/ voltage
table :
If you refer to the diagram on the left, Site 1 will be running at P0, while Site 2 will be
running at P1. Hence, Cores 1 and 2 will be running at 2.67 GHz while Cores 3 and 4
will be running at 2.13 GHz. All cores share the same Vcore (1.2875 V) because Site 1
has a higher Pstate than Site 2, and only one voltage source is available.
Quad Core (AMD)
AMD's nextgeneration monolithic quadcore processor
(Barcelona) has an advanced Pstate management technique,
compared to current processors. ikanayam lets us in on some
Barcelona power management secrets :
There are a total of 3 power planes in Barcelona – one for the
processing cores (all 4 cores shared a single power plane), one
for the north bridge (including IMC and cache) and the last one one for I/O. However,
each core has its own PLL, so they can step into different discrete frequencies
independent of each other.
In addition, there is also a separate PLL for the noncore components (like the north
bridge, cache, HTT bus, etc.) which don’t execute instructions. The noncore clock can
scale down when external bus activity is low, and vice versa. I'll update this part with
more details when Barcelona is out in the market.
Other PState Features
Super Low Frequency Mode
This technology is only implemented in mobile Core 2 Duo processors (Merom core). In
the normal lowfrequency mode, the minimum clock ratio supported by Intel is 6x. If the
Front Side Bus base clock is 200 MHz, then the minimum CPU clock speed would be 1.2
GHz. Intel wants the clock speed to go even lower, as power consumption is directly
related to (voltage2 x frequency).
Since a new PLL design isn't a good idea, Intel reduces the Front Side Bus speed to to
only 100 MHz. This forces the CPU to run at only 600 MHz with a lower core voltage
than the normal low frequency mode. Desktop Conroecore processors cannot use this
mode because it requires chipset support.
Combining CPU Cstate & Pstate
Imagine you are only running WinAmp using your Core 2 Duo E6300 processor. The
first core would be processing the equalizer band but the second core will be doing
nothing at all. If Cstate and Pstate are both enabled on this processor, the first core will
go into its lowest Pstate (P1), while the second core will go into the C1 halt state.
If you recall details from the previous pages, the C1 state does not cause the CPU to run
at lower clock speed. Therefore, when the first core runs at 1.6 GHz @ 1.225 V, the
second core in C1 state will request for 1.83 GHz @ 1.25 V. Since we only have one PLL
and Vcore in the Core 2 Duo processor, it will be forced to run at 1.83 GHz @ 1.25 V,
even though WinAmp requires very little processing power!
To counter this problem, the C1E (C1 Enhanced state) was introduced to allow the C1
state to associate with the Pstate ratio/voltage table. When C1E is enabled, the Core 2
Duo processor can now run both cores at 1.60 GHz @ 1.225 V. C1E has some benefits
over tradisional C1 state :
• It allows the sleeping core to follow the active core frequency and core voltage.
• The power consumption of processor with C1E enabled is lower than with C1
enabled, because power is directly related to (voltage2 x frequency).
• The wakeup time from certain C1E to C0 states improves, so clock and voltage
restore can restore faster.
In upcoming Core 2 Duo steppings and futuregeneration processors, C2E and C3E may
be implemented to obtain the three benefits mentioned above. C4, C5 and C6 may not
implement the enhanced state because most of the functional units in CPU have their
clocks gated and the processor itself is using a lower core voltage compared to the lowest
Vcore supported by the minimum Pstate.
Incidentally, Intel has branded its Pstate management as the Enhanced Intel Speedstep
Technology (EIST), but the C1E feature doesn't have a special name. AMD lumped both
Pstate and C1E together and called it Cool n Quiet.
I noted a good response in the forum thread :
“EIST also works the same way when C1E is disabled in the BIOS. C1E must be seen
independently of EIST and there are CPUs by Intel which only have C1E but no EIST”
Let’s view it this way. EIST was originally a mechanism to transition the CPU into
various frequency / voltage pairs in the processor active state (C0 state). We can see this
in the Pentium 4 family. However, EIST has evolved in Core 2 Duo processor, where this
piece of logic can work even in other processor power states, such as C1 and C2.
When Intel says EIST is disabled but C1E is enabled, this means the processor will not
change its clock speed and voltage when it is executing instructions, no matter whether it
is a 10% load or a 100% load. It will change its frequency and voltage only when it starts
sleeping (C1, C2 and etc.).
CPU Thermal Monitor
The main purpose of the thermal monitor is to decrease the processor power consumption
when it is running too hot. A few years back, THG demoed this functionality when
Pentium 4 was introduced. When the CPU cooler was unplugged in the middle of a game,
the Pentium 4 processor did not burn up but caused the game to slow down
tremendously.
Basically, there are two types of thermalthrottling mechanisms one managed by the
CPU itself and another managed by the ICH (chipset). The CPU thermalthrottling
mechanism is better because it is fast and efficient, compared to thermal throttling by the
ICH. Besides, the CPU supports two TM states, while the ICH only supports one.
Thermal Monitor When the thermal treshold is exceeded, TM1 changes the clock duty
1 cycle to lower down the CPU power consumption. Users will feel
(TM1) choppiness in their applications when TM1 is applied.
The throttling can be done on a percore basis (only CPU TM1). This
means Core 1 may do TM1 but Core 2 can still run normally.
Thermal Monitor When the thermal treshold is exceeded, TM2 changes the clock speed
2 and core voltage to reduce the CPU power consumption. All cores
(TM2) will activate TM2 simultaneously as there is no percore support.
Users will not feel the choppiness as the transition between Pstates
are very smooth. TM2 is only supported by the CPU.
Extended Thermal
When the thermal condition is very bad, TM1 is activated on top of
Monitor
TM2 to aggressively reduce the CPU power consumption.
(ETTM)
Conclusion
Well, I know there are just too many states in ACPI. Maybe you are still blur about the
relationship between various states. Here's the big picture :
ACPI is a very welldefined interface. There are many power states to select, so a well
designed operating system can deliver the best powerperformance balance at any
particular time. Its implementation is also robust because only privilege code (the
operating system) can use its power management features. This can prevent malicious
software from exploiting and controlling the hardware directly.
Note : This guide was based on the following article.
If you have any comments or questions about this article or power management, please
feel free to post them here. Thanks!