You are on page 1of 536

Topic 1 - Introduction to Computer Hardware Architecture

PC Architecture (TXW102)
Topic 1:
Introduction to Computer Hardware Architecture

© 2008 Lenovo

PC Architecture (TXW102) September 2008 1


Topic 1 - Introduction to Computer Hardware Architecture

Objectives:
Computer Hardware Architecture

Upon completion of this topic, you will be able to:

1. Identify the types of computers and their key differentiating features


2. Define various PC architecture terminology including computer layers,
controllers, and buses
3. Identify common industry standards and the objective of benchmarks
used with computer systems
4. Identify two features to enhance computer security

© 2008 Lenovo

PC Architecture (TXW102) September 2008 2


Topic 1 - Introduction to Computer Hardware Architecture

Types and Features of Computers

Notebook Desktop Server


- Mobile - Non-mobile - High security
- Wireless - Wired connection - Data processing and storage
Designs Designs Designs
- Ultraportable (one spindle) - Mini Desktop - Tower
- Full function (two spindles) - Ultra Small - Rack
- Desktop alternative - Small - 1U and 2U rack
(three spindles) - Desktop - Blades
- Tablet - Tower
- PC Blades
Brands Brands Brands
- Lenovo IdeaPad - Lenovo IdeaCentre - Lenovo ThinkServer
- Lenovo ThinkPad - Lenovo ThinkCentre

© 2008 Lenovo

Types and Features of Computers


The three main types of computers (or PCs) are
• Notebooks
• Desktops
• Servers

PC Architecture (TXW102) September 2008 3


Topic 1 - Introduction to Computer Hardware Architecture

Notebooks
Notebooks are optimized for traveling and for mobile users who need easy access to data.
Notebooks can be categorized as ultraportable (around three pounds), full function (up to seven
pounds), or a desktop alternative (desktop features in a larger notebook design). Lenovo uses the
brand names IdeaPad or ThinkPad for its notebook systems.

Lenovo ThinkPad T400 Lenovo IdeaPad U110

The tablet PC is another kind of PC. Tablet PCs are based on a Microsoft specification for ink-
enabled touch screen computers (using Windows XP Tablet PC Edition or Windows Vista). The
tablet PC comes in two form factors: slate (which has no keyboard attached because the tablet can
be connected to a docking station) and convertible (includes integrated keyboard). Lenovo markets
the ThinkPad X61 and X200 Tablet, which are convertibles.

Lenovo ThinkPad X200 Tablet

PC Architecture (TXW102) September 2008 4


Topic 1 - Introduction to Computer Hardware Architecture

Desktops
Desktops are for users who work in one place and need access to data on the desktop or through a
network. Lenovo uses the brand names IdeaCentre or ThinkCentre for its desktop systems.
Common desktop mechanical designs include the following:
• Ultra Small (0 slot x 2 bay or 1 slot x 2 bay)
• Small (2 slot x 3 bay or 4 slot x 3 bay)
• Desktop (4 slot x 4 bay or 3 slot x 3 bay)
• Tower (4 slot x 5 bay)

Ultra Small mechanical in Small mechanical in


Lenovo ThinkCentre M57 Lenovo ThinkCentre A62
(1 slot x 2 bay) (2 slot x 3 bay)

Desktop Mechanical in Tower mechanical in


Lenovo ThinkCentre M57 Lenovo ThinkCentre A62
(4 slot x 3 bay) (4 slot x 4 bay)

PC Architecture (TXW102) September 2008 5


Topic 1 - Introduction to Computer Hardware Architecture

Workstations
A workstation, such as a Unix workstation, RISC workstation, or engineering workstation, is a high-
end desktop or deskside microcomputer designed for technical applications. Workstations are intended
primarily to be used by one person at a time, although they can usually also be accessed remotely by
other users when necessary.
Workstations usually offer higher performance than is normally seen on a personal computer,
especially with respect to graphics, processing power, memory capacity, and multitasking ability.
Workstations are often optimized for displaying and manipulating complex data such as 3D
mechanical design, engineering simulation results, and mathematical plots. Consoles usually consist of
a high resolution display, a keyboard, and a mouse at minimum, but often support multiple displays
and may often utilize a server level processor. For design and advanced visualization tasks,
specialized input hardware such as graphics tablets or a SpaceBall may be used.
Lenovo markets the ThinkStation family of workstations.

ThinkStation S10 (left)


ThinkStation D10 (right)

Servers
Servers are computers that provide services to other computers, called clients. Servers are in secure
areas because so many users are dependent on their function. They include file servers, print servers,
terminal servers, Web servers, e-mail servers, database servers, and computation servers.
Server designs include
• Tower, which rests on the floor
• Rack-based, which must be installed in a rack
• Server blades, which have server circuitry on a single board which slides into an enclosure with
other blades.
Note: Servers for racks vary in height by a U measurement (a U is 1.75-inch height). 1U servers are
popular for Web sites because for Web pages it is better to spread the load across multiple servers
(horizontal scalability) rather than to increase the processing power of a centralized server (vertical
scalability).
Lenovo markets the ThinkServer family of servers.

PC Architecture (TXW102) September 2008 6


Topic 1 - Introduction to Computer Hardware Architecture
Types and Features of Computers:
Differentiating Computer Features

Notebook Desktop Server


- Size and weight - Fastest processor - Support many concurrent users
- Power mgmt/battery - Graphics performance (up to 1000s)
- Screen type and size - 3D graphics adapters - Multiple processors
- Integrated wireless - Systems management - Large memory capacity
- Docking station or port replicator - Removable storage (DVD±RW) - Large disk capacity
- Number of spindles or bays - Chipset (internal and external)
- Modular bay(s) - Security chip - Redundant components (disk,
- Sales presentation capability - ThinkVantage Technologies fans, power supplies, memory,
- Integrated wireless, Bluetooth etc.)
- Security chip - Hot-swap components to
- ThinkVantage Technologies maximize uptime (disk, fans,
power supplies, memory, etc.)
- Hardware Failure Prediction to
warn the admin of any
impending failure
© 2008 Lenovo

Differentiating Computer Features


Each type of computer has important characteristics that distinguish it from each other.
Key differentiating features of notebooks are size and weight, power management and battery,
screen type and size, integrated wireless, docking station or port replicator, number of spindles or
bays, modular bay(s), sales presentation capability, integrated wireless, Bluetooth, infrared, security
chip, and ThinkVantage Technologies.
Key differentiating features of desktops are fastest processor, graphics performance, 3D graphics
adapters, systems management, removable storage (DVD±RW), chipset, security chip, and
ThinkVantage Technologies.
Key differentiating features of servers are support of many concurrent users (up to 1000s), multiple
processors, large memory capacity, large disk capacity (internal and external), redundant
components (disk, fans, power supplies, memory, etc.), hot-swap components to maximize uptime
(disk, fans, power supplies, memory, etc.), and Hardware Failure Prediction to warn the admin of
any impending failure.

PC Architecture (TXW102) September 2008 7


Topic 1 - Introduction to Computer Hardware Architecture

Types and Features of Computers:


Netbook vs. Notebook

Netbook Notebook
ƒ Device for Internet ƒ Multi-purpose PC
ƒ Purpose built for Internet use ƒ Performance for multi-tasking,
ƒ Web usage: learn, play, content creation, intense workloads,
Usage
communicate, and view and Internet
ƒ No optical drive ƒ Entertainment, productivity, and
rich Web
Compact form factor Range of form factors
Size
(7-10" screen) (> 10" screen)
Price ~$250 to $450 $450 and above

Intel
Brands

Intel Atom Processor Intel Celeron, Pentium, Core 2 Processor brands

© 2008 Lenovo

Netbook vs. Notebook


The netbook is a smaller and lighter version of a notebook with less functionality.
Lenovo markets the IdeaPad S10 netbook.

Lenovo IdeaPad S10 netbook

PC Architecture (TXW102) September 2008 8


Topic 1 - Introduction to Computer Hardware Architecture

Types and Features of Computers:


Architectural Choices

Manageability
Compatibility
Performance
Total Cost of

Capital Cost

Application
Application
Ownership

Vendors
Security

Network

Ease of
Impact

Offline
Traditional PC „ „ 3 3 3 „ „ 3 Several

Server-based
Computing 3 3 z z „ 3 3 „ Citrix, MS

HP,
Blade PC 3 3 „ 3 z 3 3 „ ClearCube
Virtual Desktops
From Servers 3 3 z z 3 3 z „ VMWare

Application Softricity,
Streaming z z z 3 z „ 3 z AppStream
Ardence,
OS Streaming z z z 3 3 „ 3 „ Wyse
Source: Gartner (March 2007)
3 - Decided advantage
z - Neutral
„ - Major weakness

© 2008 Lenovo

Architectural Choices
There are many choices of computing because different requirements drive different architectures.

PC Architecture (TXW102) September 2008 9


Topic 1 - Introduction to Computer Hardware Architecture

Types and Features of Computers:


Architecture Spectrums
Server-based
PCs Hosted VMs Blade PCs Streaming Web
Computing

Data Data Data Data Data Data

Data Applications Applications

LAN/ VM VM VM Applications WTS/Citrix Web server


WAN LAN/
PC OS WAN Server
VMM LAN/
WAN
Data
LAN/
LAN/ WAN
Applications WAN Data Applications
LAN/
Preso Layer WAN Applications ICA/RDP Browser

PC OS RDP PC OS OS OS

PC Hardware Thin Client Thin Client PC Hardware Hardware Hardware

Different architectures can coexist


More Complex More Secure
Higher Total Cost of Ownership Lower Total Cost of Ownership
Highly Flexible Source: Gartner (March 2007) Rigid Design
© 2008 Lenovo

Architecture Spectrums
The PC provides the greatest degree of flexibility for diversity of applications, types of
management, and configuration options. As other architectures such as hosted virtual machines,
blade PCs, streaming, server-based computing, and Web-based computing are considered, security
and lower total cost of ownership are increased at the expense of a more rigid design with little
flexibility.

PC Architecture (TXW102) September 2008 10


Topic 1 - Introduction to Computer Hardware Architecture

Types and Features of Computers:


PC Blades
User Ports
The ClearCube User Port connects
• PC is removed computer peripherals like the
monitor, keyboard, mouse, speakers,
from user’s desk or USB peripherals to PC Blades at
and replaced with the data center or telecom closet.

a small User Port User Port (I/Port)

Ethernet
• PC Blade is in a
rack in secure,
Direct Connection
centralized location User Port PC Blade
(C/Port) The PC Blade is each user's actual
• Lenovo resells computer: a configurable, Intel-based PC
Blade that delivers full functionality to the
ClearCube-branded desktop.
PC Blades and
management Cage
The ClearCube Cage is
software a centralized chassis that
houses up to eight PC Blades.

ClearCube Management Suite


The ClearCube Management Suite empowers
administrators to manage the complete ClearCube
infrastructure from any location. This powerful,
remotely accessible suite includes a versatile set of
features such as "hot spare" switching, move
management, and automatic data backup.

© 2008 Lenovo

PC Blades
PC Blades separate the guts of the PC from the physical desktop, putting processing power in data
centers and computer rooms. Employees then have only a monitor, keyboard and mouse on their
desks, along with a client appliance that is linked back to a blade server. PC blades offer a range of
benefits, including streamlined management and tighter security since all the hardware is
centralized. PC Blade configurations provide a dedicated blade to each user or a pool of blades that
can be dynamically allocated. In addition, spare blades can be used to provide hot backup to avoid
system outages.

PC Blade Advantages
• Centralized asset management – PC Blade hardware is centralized for easy access and asset
management.
• Mission critical applications – Blade infrastructure has high levels of redundancy; users can be
swapped to a functioning blade very quickly in case of hardware or software failure
• Reduced support costs – Hardware or software upgrades can be managed centrally in a fraction
of the time it would take to upgrade large numbers of dispersed PCs.
• Multiple locations – There is potential to support multiple locations with PC Blades by remotely
switching a user to a spare standby blade in the event of hardware failure (saving the cost of an
urgent engineer visit or keeping support staff on-site).
• Reduced costs for new users – It is lower cost to install and configure a new user with a PC
Blade than a desktop.

PC Architecture (TXW102) September 2008 11


Topic 1 - Introduction to Computer Hardware Architecture

• Easy relocation – There are no significant costs when users move work location within a building.
• Improved security – The physical asset and intellectual property on the disk are centralized, e.g., it
is easier to steal a hard disk from a desktop than a PC Blade.
• Reduced user down time – Spare PC Blades can be configured to provide hot backup in case of
hardware failure.
• Improved appearance – In front office environments, the client’s user port has no moving parts,
generates no noise, produces little heat, and requires less space.
• Remote access – Users can access their own PC environment from multiple desks in the building
or from other remote locations with blade infrastructure installed

PC Blade Disadvantages
• Higher acquisition cost – Purchase price of PC Blade and its infrastructure is higher than a stand
alone PC.
• No wireless mobility – Mobile users or users who need to work away from their desks are not
supported.
• Lagging technology – PC Blade processors and technology may be six to 12 months behind
desktop technology.
• Unsuitable for power graphics users.
• New infrastructure – Significant change to current PC deployment, maintenance, and support
(skills, tools and processes).
• More difficult to plan and manage upgrades when customer has a mix of PC Blades and traditional
desktops.
• Lack of local CD and DVD drives except USB devices which open security risks and asset control
issues.
• User resistance for advanced/experienced PC users to losing access to 'their' PC.
• Extra cost for redundancy – Extra cost for closet spare (with cooling) to enable redundancy.
• Technology lock in – Little option to cascade or sell blades to other users or customers.

PC Architecture (TXW102) September 2008 12


Topic 1 - Introduction to Computer Hardware Architecture

Types and Features of Computers:


ClearCube PC Blade Products

• PC Blade
- PC Blade is located with other
PC Blades in a rack in centralized
location
- Intel processor, memory, disk, and
graphics on PC Blade

• User Port PC Blade (ClearCube R1300)

- Small client device that connects the


user’s monitor, keyboard, mouse,
speakers, and USB devices to their
PC Blade
- No moving parts, generates no noise,
and creates little heat
- Can support multiple monitors
User Port (ClearCube C/Port)

© 2008 Lenovo

ClearCube
ClearCube is a company that has offered PC Blades since 1997 and dominates the PC Blade
market. Lenovo resells ClearCube-branded PC Blades and management software.
See www.clearcube.com for more information.

PC Architecture (TXW102) September 2008 13


Topic 1 - Introduction to Computer Hardware Architecture
Types and Features of Computers:
Lenovo IdeaPad Notebooks and IdeaCentre Desktops

Lenovo IdeaPad Notebooks

IdeaPad U110 IdeaPad Y530 IdeaPad Y730

Lenovo IdeaCentre Desktops

IdeaCentre K210 IdeaCentre Q200


© 2008 Lenovo

Lenovo IdeaPad Notebooks and IdeaCentre Desktops


In 2008, Lenovo introduced the IdeaPad Family of notebooks and the IdeaCentre Family of
desktops. The following information shows the branding and positioning of the two product
lines.
Lenovo
New World Company
Best Engineered PCs

Think Family Idea Family


(ThinkPad and ThinkCentre) (IdeaPad and IdeaCentre)

Think Idea
The Ultimate Business Tool Engineered for People

Rock Solid Cutting Edge Capabilities


Thoughtful Design Trendsetting Design
Lowest TCO Peace of Mind

Visit lenovo.com for more information on the IdeaPad notebooks and IdeaCentre desktops.

PC Architecture (TXW102) September 2008 14


Topic 1 - Introduction to Computer Hardware Architecture
Types and Features of Computers:
Lenovo ThinkPad, ThinkCentre, and ThinkVision
Lenovo Think Family

ThinkVantage ThinkPlus ThinkVantage


Technologies Accessories Design
and Services

ThinkPad, ThinkCentre, and ThinkVision offerings will continue to differentiate


Lenovo from our competitors with…
• Quality, service, and support expected from Lenovo
• Industrial design that simplifies and enhances usability
• Open standards-based products that work well together
• Lenovo innovation that delivers key benefits for customers
© 2008 Lenovo

Lenovo ThinkPad, ThinkCentre, and ThinkVision


The Lenovo Think-branded family of offerings includes the following brands:
• ThinkPad – Notebook category
• ThinkCentre – Desktop category
• ThinkVision – Visuals category
• ThinkVantage Technologies – Solutions and offerings category
• ThinkPlus Accessories – Accessories and upgrades for Think products
• ThinkPlus Services – Services for Think offerings
An essential part of what makes a product a ThinkPad, ThinkCentre, or ThinkVision offering is
its industrial and graphic design. Lenovo calls this design approach “ThinkVantage Design.”
ThinkVantage Design is built upon the concept of synergistically joining form and function.
ThinkVantage Design provides value to the customer by providing meaningful innovation that
enhances the ownership experience. It also has its own “design DNA” based on the classic
ThinkPad design, which is its heritage.
Visit lenovo.com for more information on Lenovo brand offerings.

PC Architecture (TXW102) September 2008 15


Topic 1 - Introduction to Computer Hardware Architecture

Types and Features of Computers:


ThinkVantage Technologies

ImageUltra Builder System Migration Assistant


Consolidates multiple software Moves system settings and data
images into one master image easily from an old PC to a new PC

Software Delivery Center Access Connections


Automates delivery of application Switches painlessly between
software updates to PCs settings for different wireless
and wired networks

System Information Center Client Security Solution


Collects and tracks PC asset and Secures users’ PCs, data, and
security compliance information network communications from
unauthorized access

Rescue and Recovery System Update


Enables hassle-free recovery Accesses, downloads, and installs the
of data and system image latest updates for Think systems
Productivity Center
Provides users with access to
self-help support tools and
information with just one click
© 2008 Lenovo

ThinkVantage Technologies
ThinkVantage Technologies are a select group of offerings from
Lenovo designed to address emerging customer needs. Adding
value to open industry standards, ThinkVantage Technologies help
customers manage the cost of deploying end-user systems,
implement new technologies such as wireless computing, and help
ensure that these technologies can be implemented securely. While
many of these offerings currently exist, some are being significantly
enhanced and all of them have now been consolidated into a single
family of offerings.
Visit lenovo.com/thinkvantage for more information.

Popup from ThinkVantage


Productivity Center

PC Architecture (TXW102) September 2008 16


Topic 1 - Introduction to Computer Hardware Architecture

Types and Features of Computers:


Benefits of ThinkVantage Technologies

ThinkVantage Technologies address the entire customer ownership experience from


deployment to disposal.

Image Creation Hassle Free Secure Client End-


End-user Self-
Self- Hard Drive
• ImageUltra Builder Connection Data help Portal Data
- Hardware Independent • Access • Client Security Solution • Productivity Center Destruction
Imaging Technology Connections
• Password Manager • Rescue and • Secure Data
- Dynamic Operating
Recovery Disposal
Environment
- Software Delivery
Assistant Secure Data Information
Media and Asset
• Image on Demand
• Active Protection Management
• Imaging Technology System • System Information
Center
Center
Backup and
Network Deployment Software on
Recovery
• Remote Deployment
• Rescue and Recovery
Demand
Manager
• Software Delivery
Center
Client Migration Critical Updates
• System Update
• System Migration • Rescue and Recovery
Assistant with Antidote Delivery
Manager

© 2008 Lenovo

Benefits of ThinkVantage Technologies


Industry analysts state that the annualized cost of a PC represents less than 20 percent of the annual
total cost of ownership. ThinkVantage Technologies address the other 80% to help reduce your
total cost of ownership. ThinkVantage Technologies also help improve your business' productivity
and efficiency throughout each system's life cycle as you deploy, connect, protect, support, and
dispose of your company's PCs.

End-user productivity Life-cycle Management


(value out of the box that also can (solutions for SMB and LE)
provide key IT benefits)

• Access Connections • System Information Center


• Productivity Center • Software Distribution Center
• Active Protection System • ImageUltra Builder
• Client Security Software • Secure Data Disposal
• Rescue and Recovery • Remote Deployment Manager
• System Migration Assistant
• System utilities

PC Architecture (TXW102) September 2008 17


Topic 1 - Introduction to Computer Hardware Architecture

Life Cycle Phase TVT Purpose/Function of Tool


Deploy ImageUltra Builder Umbrella name for imaging technology that is focused on simplifying the
complexity of creating and managing corporate images; ImageUltra
Builder consists of the three components described below

- Software Delivery Assistant Provides customized installation of software applications based on a


(SDA) user's unique work group assignment and/or needs

- Dynamic Operating Environment Consolidates support for multiple operating systems and languages into
(DOE) one Super Image

- Hardware Independent Imaging Provides hardware-independent images that will support multiple system
Technology (HIIT) types via system-specific drivers and applications pulled from the PC’s
service partition (supported on ThinkPad and ThinkCentre systems only)

System Migration Assistant (SMA) An easy-to-use migration tool that automates the migration of both
settings and data through a menu system and advanced scripting
capability

Remote Deployment Manager Network-based imaging and system support tool that distributes images
(RDM)* and updates system settings (includes PowerQuest Drive Image Pro Lite)

Connect Access Connections Manages all wireless and wired connectivity settings and allows easy
switching between them

Protect Client Security Solution (CSS) Security software that provides authentication of end-user identity,
encryption of data, and simplified password management

Active Protection System (APS) Prevents some hard-drive crashes on most new ThinkPad models by
temporarily stopping the hard drive when a fall or similar event is
detected; provides up to four times greater impact protection than
systems without this feature

Support Productivity Center Provides one-button access to self-help support tools and information
about a user’s system

Rescue and Recovery A help desk behind the button that allows a system to recover itself from
OS corruption and even hard drive failures, fills the gap between
traditional backup and restore programs and re-imaging, allows remote
system recovery with or without user intervention, and automates the
deployment of critical patches, even if a system will not boot.

System Information Center (SIC) An electronic inventory management solution that tracks client PC
hardware and software assets, provides ThinkVantage Technology usage
information, and reports and measures security compliance

Software Delivery Center (SDC) Automates delivery of application software updates to PCs without end-
user intervention or disruption

Asset ID A unique asset-tagging technology that allows data, events, actions, and
responses to be read by or become interactive with other programs

System Update Accesses, downloads, and installs the latest updates for Think systems

IBM Director Agent An advanced CIM/WMI-based agent for systems and asset management
that allows extensive upward integration into other enterprise
management tools and databases

IBM Director A powerful systems management program with extensions targeted for
improved system setup, remote manageability, and event monitoring of
clients and servers

Dispose Secure Data Disposal An automated program that allows multiple levels of disk cleansing, which
ensures systems are properly safeguarded during disposal; meets the
DOD Level 5 and German standards for safe disposal

PC Architecture (TXW102) September 2008 18


Topic 1 - Introduction to Computer Hardware Architecture

ThinkVantage Technology Potential Savings* Assumptions Used in Calculating Savings

ImageUltra Builder $100 per unit • Deployed once per system versus typical cloned
image management and loading process

System Migration Assistant $70 per unit deployed • Deployed once per system versus manual
processes

Remote Deployment Manager $90 per system • Deployed once per system versus manual
processes

Access Connections $50 per wireless PC • Annual savings and only in notebook systems
• Assumes two help desk call per user per year

Client Security Solution $124-250+ per unit • $35-60 hardware replacement


• $49-150 encryption software replacement
• $40 support cost reduction
• Replaces comparable equipment (key fobs, etc.)

Active Protection System $200 or more per • Only in notebook systems


occurrence • $200 is the hardware replacement cost for a
ThinkPad 30 GB hard drive

Rescue and Recovery $180 per occurrence • Used for one incident in 13% of installed systems
• Average support time savings of 183 minutes

Secure Data Disposal $45 per PC • Once per PC per life cycle

* Potential savings are based on typical customer environments. Some figures represent costs that customers may redirect from
labor-intensive areas to other areas of their business. Other figures are based on cost avoidance of competitive solutions
purchased separately. All figures are calculated using the TVT and Wireless Calculators and data from Gartner Research and
customers. Actual savings are not guaranteed and will vary by customer.

PC Architecture (TXW102) September 2008 19


Topic 1 - Introduction to Computer Hardware Architecture

PC Architecture:
Computer Layers

• Layered structure
User
- This structure allows for
compatibility. Applications

- Bypassing layers increases Middleware API


performance. Operating
system
• BIOS (basic input/output system) Device
driver BIOS
- Located in flash memory Firmware

(sometimes called EEPROM) Adapter Hardware

- Supports plug-and-play Layers


- Supports power management
• Device driver
- Software to control a piece of
hardware

EEPROM BIOS

© 2008 Lenovo

Computer Layers
A computer consists of several layers that each have interfaces to communicate to the layer next
to it. A layered structure allows for compatibility; for example, the same shrink wrapped
operating system can work on millions of PCs from different vendors because it interfaces with
industry standard BIOS calls. The disadvantage to the layers is that each layer can slow
performance. So to increase performance a layer could be bypassed; for example, an application
could be written directly to the BIOS and device driver of a specific computer which would gain
performance, but would only work on that unique computer.
Some of the different computer layers shown in the diagram above are explained below.
• Applications are the software programs with which a user typically interacts, such as those used
for word processing (Microsoft Word), Web browsing (Internet Explorer), sending e-mail
(Lotus Notes), and using spreadsheets (Microsoft Excel).
• Middleware is software that provides an additional level of abstraction to applications. The idea
behind the middleware is to hide the complexity of implementing code that is not strictly
related to the business objectives that the application is supposed to be written for. Writing
applications against the basic APIs that the OS is able to expose is sometimes very time
consuming and it might take a while before a programmer starts to get into the “business
modules” of the application being developed. Using middleware is like actually talking to a
cleverer interface compared to the interface provided by the OS. Middleware has to implement
all the “boring stuff,” so that developers can concentrate on the business logic. Examples
include IBM DB2, Oracle, Microsoft SQL Server, Lotus Domino, Microsoft Internet
Information Server, IBM WebSphere, and BEA WebLogic.

PC Architecture (TXW102) September 2008 20


Topic 1 - Introduction to Computer Hardware Architecture

•The operating system is a set of programs that provides an environment in which applications
can run, allowing them easily to take advantage of the processor and I/O devices, such as disks
or adapters. Examples include Windows 2000, Windows XP, Vista, Red Hat Linux, and AIX
5L.
•The Basic Input/Output System (BIOS) is a set of program instructions that activates system
functions independently of hardware design (the layer between the physical hardware and the
operating system) and allows for software compatibility. BIOS is typically located in flash
memory (EEPROM) on the systemboard. When a PC is started, the BIOS runs a power-on self-
test (POST). It then tests the system and prepares the computer for operation by searching for
other BIOSes on the plug-in boards and setting up pointers (interrupt vectors) in memory to
access those routines. It then loads the operating system and passes control to it. The BIOS
accepts requests from the drivers as well as the application programs. The BIOS supports plug-
and-play and power management. Although there are several BIOS vendors, there are few
differences among their products.
Note: To preclude the problem of performing OS, BIOS, or driver updates before the OS or
network drivers are loaded, a Preboot Execution Environment (PXE) allows the system to boot
off the network. At boot, a PXE agent executes, and the PC gets an IP address from a DHCP
server and then uses the BOOTP protocol to look for a PXE server. The PXE client is firmware
implemented in BIOS (if LAN hardware is on the systemboard) or as a boot PROM (if LAN
adapter). Programs, including those in the PXE environment, require system configuration and
diagnostic information. A Systems Management BIOS (SMBIOS) is a chip that makes the
necessary information available via BIOS calls that are available through the OS and in the
preboot environment.
•Firmware is usually the layer of software that is between the device driver and adapter. It
typically is on a EEPROM of an adapter card and can be upgraded with new releases. Firmware
is similar to BIOS.
•Device Drivers are a type of software (which may be embedded in firmware) that controls or
emulates devices attached to the computer such as a printer, scanner, hard disk, monitor, or
mouse. Device drivers are typically loaded low into the memory of PCs at boot time. A device
driver expands an operating system's ability to work with peripherals and controls the software
routines that make peripherals work (a network card, a disk, printer). These routines may be part
of another program (many applications include device drivers for printers), or they may be
separate programs. Basic drivers come with the operating system, and drivers are normally
installed for each peripheral added.

PC Architecture (TXW102) September 2008 21


Topic 1 - Introduction to Computer Hardware Architecture

Layers of a PC: Illustration


To illustrate the different layers of a PC, below is a graphic that shows the series of processes
that occur when a user executes a single keystroke on a computer keyboard, and the
corresponding key to the numbered steps.

Application Program
6.
Operating System
4.

5. BIOS 3.
7.
Video
Circuitry
Keyboard
Cable
Keyboard Port
Hardware
1. 2. Hardware

1. The user presses N on keyboard.


2. The code corresponding to pressed key is sent over the cable to the keyboard port on
the systemboard.
3. The keyboard BIOS routine accepts the code, translates it into the letter n, and passes
it to the operating system.
4. The operating system passes the keystroke to the application program and sends the n
to the video BIOS (or the device driver).
5. The video BIOS sends the n to the graphics circuitry.
6. The application program accepts the keystroke and instructs the operating system to
look for the next keystroke.
7. The n appears on the screen.

PC Architecture (TXW102) September 2008 22


Topic 1 - Introduction to Computer Hardware Architecture

PC Architecture:
Subsystems

Major internal subsystems of a PC:

• Processor (Core 2 Duo)


Processor +
• L2 cache (2 MB) L1/L2 cache
PCI Express Memory and optional
x16 slot graphics controller
• Memory (2 GB) MCH or
PCI Express slots GMCH
• Bus(es) (PCI, PCIe) host bridge Memory
Direct
• Graphics controller (SVGA) Media
Interface PCIe controller
• Disk controller (SATA) I/O PCI controller
PCIe Controller SATA controller
and disk (250 GB) Hub IDE controller
(ICH)
USB controller
• Slots (PCI Express) 4 SATA disks
Super I/O USB 2.0
Firmware AC '97 codec
hub or
Low Pin
High Definition Audio
Count interface

© 2008 Lenovo

Subsystems
Subsystems in a PC communicate to each other via buses. Buses adhere to a particular
architecture (set of rules) to allow compatibility with the numerous subsystems that adhere to
the same architecture.
Most PCs are associated with the term Wintel, which refers to Microsoft Windows and Intel
chip technologies. PCIe stands for PCI Express.
The processor is the central component of a PC. Intel and AMD are the main processor vendors
used in PCs.
Data in the processor, caches, memory, buses, disk controller, and graphics controller is stored
electrically; so when electrical power is shut down, this data is lost. Data on the disk is stored
magnetically, so the data is saved even when electrical power is removed.

PC Architecture (TXW102) September 2008 23


Topic 1 - Introduction to Computer Hardware Architecture

PC Architecture:
Controllers

• All major subsystems have controllers.


• Controllers are circuitry controlling manner, method, and speed of access
to device.
• Controllers are part of chipset.

Controller Controls Examples


L2 cache controller L2 cache 2 MB
Memory controller Memory 2 GB
Bus controller(s) Data bus PCI Express (PCIe)
Graphics controller Monitor ATI Radeon 9600
Disk controller Disk 250 GB Serial ATA disk

© 2008 Lenovo

Controllers
All major subsystems have controllers that define how data will be obtained and stored.
Sometimes a controller is a single chip with the data stored in separate physical circuitry. For
example, a memory controller controls memory, but the data is stored in different physical chips
called DIMMs.
Sometimes a single physical chip contains multiple controllers. For example, the I/O Controller
Hub (ICH) is a single physical chip which houses the PCI Express controller, PCI controller,
Serial ATA controller, EIDE controller, USB controller, and other controllers.
Controllers are normally included in the chipset of the computer.

PC Architecture (TXW102) September 2008 24


Topic 1 - Introduction to Computer Hardware Architecture

PC Architecture:
Buses

Most transfers use three buses


• Control bus
• Address bus
16-bit bus = 16 wires for on/off charges (data)
• Data bus 32-bit bus = 32 wires for on/off charges (data)
64-bit bus = 64 wires for on/off charges (data)

Control I/O
Controller Data
Address

Processor Data Memory


Disk Graphics LAN

• Some architectures multiplex signal on same bus (wires)

© 2008 Lenovo

Buses
If two subsystems are on a bus, such as in the diagram with processor and memory, a data
transfer first involves sending the address on the address bus. Next, data is sent on the data bus.
If multiple subsystems exist on a bus, a control bus is needed in addition to the address and data
bus. The control bus is used to signal which subsystem will control the bus for the next transfer.

PC Architecture (TXW102) September 2008 25


Topic 1 - Introduction to Computer Hardware Architecture

Address Bus
An address bus determines how much memory the processor or any subsystem can directly
address. For example, a 32-bit address bus means 2 to the power of 32 or 4 billion unique
numbers to address 4 GB of memory.

0 0 0 0 . . . 0 0 0 0 0
0 0 0 0 . . . 0 0 0 0 1
0 0 0 0 . . . 0 0 0 1 0
0 0 0 0 . . . 0 0 1 0 0
0 0 0 0 . . . 0 0 1 0 1
Memory Addressing Similar to Car Odometer

Before data is read or written by a processor, the address of that data is sent first. This address is
sent on a separate set of physical wires called the address bus. The data is then sent on a
different set of physical wires called the data bus.
A processor is designed to use a certain maximum quantity of address lines. The amount of
physical memory that a processor can address is determined by this quantity. The number of
unique numbers that can be made by a base two number system (0s and 1s) with the quantity of
address digits determines the maximum addressable memory of a processor. Software can limit
this maximum addressability, for example, DOS sets the processor to use 20 address lines as
DOS only addresses 1 MB of memory.
Following are some processors and their addressability:
Address lines Addressable memory Examples
24 16 MB 486SLC
32 4 GB 486DX2
36 64 GB Pentium 4, Xeon
40 1 TB EM64T physical memory
44 18 TB Itanium
48 256 TB EM64T virtual memory
64 18 EX IA-64 64-bit flat addressing
Sometimes operating systems limit addressability, so that the operating system can not utilize
all the available physical memory.

PC Architecture (TXW102) September 2008 26


Topic 1 - Introduction to Computer Hardware Architecture

PC Architecture:
Bus Speeds

Processor +
L1/L2 Cache
System bus
400 to 1066 MHz
PCI Express MCH or Memory
x16 slot GMCH
PCI Express slots Host Bridge Memory bus
PCIe 2.5 GHz
200 to 800 MHz
PCIe 2.5 GHz
PCI 2.0 33 MHz I/O
Controller Direct Media Interface (DMI)
Hub 100 MHz
PCI 2.0 slots (ICH)
4 SATA
disks
Firmware USB
Super I/O Hub
Low Pin Count (LPC) Interface 33 MHz

• Each bus is clocked at a different rate.


• Bus speed is different from data transfer rate (MB/s).
• Newer buses are double data rate (same MHz doubles throughput).
• System bus and memory bus can be asynchronous or synchronous.
© 2008 Lenovo

Bus Speeds
Each bus in a PC has a speed (measured in megahertz) and a data transfer rate.
The bus between the processor and the memory controller was originally called the frontside
bus; the processor had a separate bus to its integrated L2 cache called the backside bus and a
separate bus outside the processor to the memory controller called the frontside bus. The
frontside bus and the backside bus were two different buses. With the introduction of the
Pentium 4 and follow-on processors, the frontside bus was named system bus, although both
terms were still used interchangeably. The change of the name to system bus was due to the fact
that the L2 cache was not isolated off a separate, independent bus to the degree that it was for
earlier processors, such as the Pentium II and Pentium III.
The memory bus is clocked at 200 to 400 MHz, but most memory today is double data rate
(DDR); this means data is transferred on both the rising and falling edge which doubles the
throughput from the base clock speed.
The system bus and the memory bus can be either synchronous or asynchronous, depending on
the memory controller of the chipset. Some memory controllers only support synchronous
system and memory bus speeds; some support either synchronous or asynchronous speeds. An
example of a memory controller that supports synchronous system and memory bus speeds is a
400 MHz system bus with a 200 MHz memory bus with PC2-3200 DDR2 memory (there is an
even multiple of 200 among 200 MHz and 400 MHz). An asynchronous example is a 400 MHz
system bus with a 266 MHz memory bus with PC2-4200 533MHz DDR2 memory. DDR3 uses
a 400 to 800 MHz memory bus.

PC Architecture (TXW102) September 2008 27


Topic 1 - Introduction to Computer Hardware Architecture

In 1996 and 1997, the PC industry standardized the 66 MHz system bus. Migration to a 100
MHz system bus occurred in 1998, then to a 133 MHz bus in 2000. The Pentium 4 introduced a
400 MHz system bus in late 2000, although it was really 100 MHz × 4 to yield 400 MHz. Later,
Pentium 4 processors utilized an 800 MHz system bus (200 MHz × 4) followed by a 1066 MHz
system bus (266 MHz × 4). The Core 2 Quad uses a 1066 MHz system bus (266 MHz × 4)

Data Transfer Rates


Data transfer rates (assuming that data is transferred on only one edge of the clock):
• 32-bit at 33 MHz is 132 MB/s (PCI bus)
• 32-bit at 66 MHz is 264 MB/s (PCI bus)
• 64-bit at 33 MHz is 264 MB/s (PCI bus)
• 64-bit at 66 MHz is 528 MB/s (PCI bus and system bus)
• 64-bit at 100 MHz is 800 MB/s (system bus)
• 64-bit at 200 MHz is 1.6 GB/s (backside bus to L2 cache; PC1600 DDR memory)
• 64-bit at 266 MHz is 2.1 GB/s (PC2100 DDR memory)
• 64-bit at 400 MHz is 3.2 GB/s (backside bus to full speed L2 cache, Pentium 4 system bus)

PC Architecture (TXW102) September 2008 28


Topic 1 - Introduction to Computer Hardware Architecture

PC Architecture:
Cache

• Cache is a buffer between subsystems.


• Disk transfer could involve five cache locations.

Processor +
1. L1 Cache
2. L2 Cache
PCI Express
x16 slot
MCH or 3. Memory
PCI Express slots
GMCH
Host Bridge

PCI
Express
I/O
Controller
4. Hub
SCSI (ICH)
EIDE Disks
5.
Firmware
Hub

© 2008 Lenovo

Cache
Cache is a storage place (buffer or bucket) that exists between two subsystems in order for data
to be accessed more quickly to increase performance. Performance is increased because the
cache subsystem usually has faster access technology and does not have to cross an additional
bus. Cache is typically used for reads, but it is increasingly being used for writes as well.
For example, getting information to the processor from the disk involves up to five cache
locations:
1. L1 cache in the processor (memory cache)
2. L2 cache (memory cache)
3. Software disk cache (in main memory)
4. Hardware disk cache (some disks may only use an FIFO buffer)
5. Disk buffer
For reads, one subsystem will usually request more data than what is immediately needed, and
that excess data is stored in the cache(s). During the next read, the cache(s) is searched for the
requested data, and if it is found, a read to the subsystem beyond the cache is not necessary.

PC Architecture (TXW102) September 2008 29


Topic 1 - Introduction to Computer Hardware Architecture

Industry Standards:
Restriction of Hazardous Substances (RoHS) Directive

• Restriction of hazardous substances in electrical Restricted substances:


and electronic equipment
• Lead
• Started by the European Union • Mercury
• Effective for shipments after July 1, 2006 • Cadmium
• Lenovo PC products comply across all of its • Hexavalent chromium
product lines worldwide • Polybrominated biphenyls
• Polybrominated
diphenylethers

ƒ
FIN FIN
IS
ƒNO NO
RUS
ƒES ES
SE
ƒLV LV
ƒLT LT
ƒBY BY
ƒ
IE IE
ƒGB GB
ƒPOL POL
ƒNL ƒGER
NL GER
UK
ƒBE BE
ƒLU LU ƒCZ CZ
ƒSK SK
ƒAT AT
ƒ
FR ƒ
FR CH CH ƒHU HUƒROMROM
ƒHR HR
ƒBA BA
ƒYU YU ƒBUL BUL
IT
ƒMK MK
ƒ
PT ƒ
PT SP SP ƒ AL
TUR
ƒGR GR

CY
MAL

© 2008 Lenovo

Restriction of Hazardous Substances (RoHS) Directive


In February 2003, the European Union (EU) issued directive 2002/95/EC on the restriction of the
use of certain hazardous substances in electrical and electronic equipment on the EU market
beginning July 1, 2006. The Restriction of Hazardous Substances (RoHS) Directive requires
producers of electrical and electronic equipment to eliminate the use of six environmentally-
sensitive substances: lead, mercury, cadmium, hexavalent chromium, and the use of
polybrominated biphenyls (PBB) and polybrominated diphenylethers (PBDE) flame retardants. The
purpose is to eliminate the potential risks associated with electronic waste, so this legislation affects
the content and disposal requirements for electronic products.
Most IT hardware is included in scope of the directive: PCs, printers, servers, storage, and options.
Products (and their components) must comply.
Lenovo PC products comply with the RoHS directive across its product lines worldwide.
See www.rohs.gov.uk for more information.

PC Architecture (TXW102) September 2008 30


Topic 1 - Introduction to Computer Hardware Architecture

A similar directive from the European Union is the Waste Electrical and Electronic Equipment (WEEE)
Directive. WEEE encourages and sets criteria for the collection, treatment, recycling and recovery of
electrical and electronic waste. WEEE requires producers to ensure that equipment they put on the
market in the EU after August 13, 2005 is marked with the crossed-out wheeled bin symbol, the
producer’s name, and indication that the equipment was put on the market after August 13, 2005.
Lenovo PC products comply with the WEEE Directive requirements.

WEEE-Compliant Symbol

Example of RoHS Labeling on a PCI Express Adapter

PC Architecture (TXW102) September 2008 31


Topic 1 - Introduction to Computer Hardware Architecture

Industry Standards:
ENERGY STAR 4.0 and 80 PLUS

• ENERGY STAR 4.0 ENERGY STAR 4.0 Requirements:

- Voluntary lableling program by EPA to identify • 80% efficient power supply


and promote energy-efficient products • Low idle power
- Systems receive certification and logo when
energy saving requirements met
• 80 PLUS
- Power supply 80% or greater energy efficient
- Required to get ENERGY STAR 4.0 logo Logo for ENERGY STAR
- Alternative option for non-ENERGY STAR
systems; more flexibility on configuration
• Select Lenovo notebooks, desktops, and
workstations are ENERGY STAR 4.0-compliant
• All Lenovo ThinkVision monitors are
ENERGY STAR 4.0-compliant
Logo for 80 PLUS
certified power supply

© 2008 Lenovo

ENERGY STAR 4.0 and 80 PLUS


In 1992 the US Environmental Protection Agency (EPA) evolved its voluntary program, called
ENERGY STAR, to cover computers. The ENERGY STAR program for computers has the goal of
generating awareness of energy saving capabilities, as well as differentiating the market for more
energy-efficient computers and accelerating the market penetration of more energy-efficient
technologies. On July 20, 2007, the EPA updated the ENERGY STAR computer specification to
Version 4.0.
Monitors can qualify for ENERGY STAR 4.1 Tier 1 or Tier 2 based on power requirements for
Sleep and Off modes.
There are two fundamental changes from the current ENERGY STAR 3.0 program to the ENERGY
STAR 4.0:
1. Idle power under the operating system will now be measured and used as a metric to earn the
ENERGY STAR 4.0 rating as opposed to Standby power.
2. An 80% efficient power supply is a requirement (80 PLUS logo not required).
ENERGY STAR 3.0 was concerned with Standby power. In Standby, the number of storage devices,
processor cores, graphics power, memory etc. makes very little difference to the power used. These
devices are either turned off or are in a very low power state. However, in idle, all these devices are
drawing power.

PC Architecture (TXW102) September 2008 32


Topic 1 - Introduction to Computer Hardware Architecture

The 80% efficient power supply is a principal requirement of ENERGY STAR 4.0 and a major new
innovation. When a power supply converts AC power from the wall to the various DC voltages that the
computer needs, there is always a loss of power. The power loss varies with how busy the computer is.
An 80% efficient power supply is guaranteed to lose less than 20% of the AC power at 20%, 50% and
100% loads. Currently, power supplies for desktop computers range from approximately 65% to 75%
efficiency. A system can have an 80 PLUS power supply but not be ENERGY STAR 4.0-compliant. 80
PLUS power supplies are always auto-sensing.
See www.energystar.gov/ and www.80plus.org for more information.

ENERGY STAR 4.0 Classifications

2007 Limits
If hardware configuration consists of… Category Idle Sleep Standby
(watt) (watt) (watt)
• Multi-core processor or Then C 95W
Multi-processor and
• Graphics controller with > 128 MB
discrete memory and 4W 2W
• Two of the following three:
• ≥ 2 GB system memory 4.7W with 2.7W with
• TV Tuner and/or video capture WOL WOL
• ≥ 2 disks Else

• Multi-core processor or Then B 65W


Multi-processor and
• ≥ 1 GB system memory Else
• Any configuration not covered in Then A 50W
Category C or Category B
Hardware configuration determines category and resultant power limits.

PC Architecture (TXW102) September 2008 33


Topic 1 - Introduction to Computer Hardware Architecture

Industry Standards:
EPEAT

• Industry standard comparing desktop,


notebook, and monitors based on
environmental attributes
• Bronze, Silver, or Gold performance tiers
• EPEAT products must be ENERGY STAR
4.0-compliant
• Select Lenovo notebooks, desktops, and
monitors are EPEAT Gold-compliant Select Lenovo ThinkCentre A61e
desktops are EPEAT Gold-compliant

© 2008 Lenovo

Electronic Product Environmental Assessment Tool (EPEAT)


EPEAT is a system to help purchasers in the public and private sectors evaluate, compare and select
desktop computers, notebooks and monitors based on their environmental attributes. EPEAT also
provides a clear and consistent set of performance criteria for the design of products, and provides an
opportunity for manufacturers to secure market recognition for efforts to reduce the environmental
impact of its products.
The EPEAT Registry on the EPEAT Web site includes products that have been declared by their
manufacturers to be in conformance with the environmental performance standard for electronic
products known as IEEE 1680-2006. EPEAT operates a verification program to assure the credibility
of the registry.
EPEAT evaluates electronic products according to three tiers of environmental performance: Bronze,
Silver and Gold. The complete set of performance criteria includes 23 required criteria and 28 optional
criteria in 8 categories: reduction/elimination of environmentally sensitive materials; material
selection; design for end of life; product longevity/life cycle extension; energy conservation; end of
life management; corporate performance; packaging.
All EPEAT Bronze, Silver, and Gold registered products must be ENERGY STAR 4.0-compliant.
See www.epeat.net for additional information.

PC Architecture (TXW102) September 2008 34


Topic 1 - Introduction to Computer Hardware Architecture

Industry Standards:
Climate Savers

• Promotes technologies that improve the


efficiency of a computer’s power delivery
and reduce the energy consumed in an
inactive state
• Lenovo is on the Board of Directors and
one of the original 40 companies of the Logo for Climate Savers
initiative
• Lenovo’s offerings are exceeding current
targets of the Climate Savers challenge

Logo for Climate Savers

© 2008 Lenovo

Climate Savers
The goal of the environmental effort is to save energy and reduce greenhouse gas emissions by
setting new targets for energy-efficient computers and components, and promoting the adoption of
energy-efficient computers and power management tools globally.
The typical desktop PC wastes more than half of the power it draws from a power outlet. The
majority of this unused energy is wasted as heat and never reaches the processor, memory, disks, or
other components. As a result, offices, homes, and data centers have increased demands on air
conditioning which in turn increases energy requirements and associated costs.
The Challenge starts with the 2007 ENERGY STAR requirements for desktops, laptops and
workstations (including monitors), and gradually increases the efficiency requirements over the next
three years, as follows:
• From July 2007 through June 2008, PCs must meet the Energy Star requirements. This means 80
percent minimum efficiency for the power supply unit (PSU) at 20 percent, 50 percent, and 100
percent of rated output, a power factor of at least 0.9 at 100 percent of rated output, and meeting
the maximum power requirements in standby, sleep, and idle modes.
• From July 2008 through June 2009 the standard increases to 85 percent minimum efficiency for
the PSU at 50 percent of rated output (and 82 percent minimum efficiency at 20 percent and 100
percent of rated output).
• From July 2009 through June 2010, the standard increases to 88 percent minimum efficiency for
the PSU at 50 percent of rated output (and 85 percent minimum efficiency at 20 percent and 100
percent of rated output).
See www.climatesaverscomputing.org for more information.

PC Architecture (TXW102) September 2008 35


Topic 1 - Introduction to Computer Hardware Architecture

Industry Standards:
GREENGUARD

• Certification focused on acceptable indoor


air standards
• GREENGUARD Environmental Institute
• Select Lenovo systems are
GREENGUARD certified

Lenovo ThinkPad X300 is GREENGUARD certified

© 2008 Lenovo

GREENGUARD
The GREENGUARD Environmental Institute is an industry-independent, non-profit organization
that oversees the GREENGUARD Certification Program. As an ANSI Accredited Standards
Developer, GEI establishes acceptable indoor air standards for indoor products, environments, and
buildings. GEI’s mission is to improve public health and quality of life through programs that
improve indoor air. A GEI Advisory Board consisting of independent volunteers, who are
renowned experts in the areas of indoor air quality, public and environmental health, building
design and construction, and public policy, provides guidance and leadership to GEI.
The GREENGUARD Certification Program is an industry independent, third-party testing program
for low-emitting products and materials.
Select Lenovo products are GREENGUARD certified.
See www.greenguard.org for more information.

PC Architecture (TXW102) September 2008 36


Topic 1 - Introduction to Computer Hardware Architecture

Industry Standards:
Intel High Definition Audio (Intel HD Audio)

• Next-generation architecture (after AC ’97) for implementing audio, modem,


and communications functionality
• Immersive home-theater-quality sound experience including Dolby 7.1
audio capability
• Up to eight channels at 192 kHz with 32-bit quality
• Multi-streaming capabilities to send two or more different audio streams to
different places at the same time
• Supported with Intel ICH6 and later I/O Controller Hubs
- ICH6/ICH7 integrates both AC ’97 and HD Audio to facilitate transition
- ICH8/ICH9/ICH10 only integrates HD audio (not AC '97)
- Only AC ’97 or Intel HD Audio can be used at one time

Dolby
Game audio Digital
Intel HD Audio
supports multiple
audio streams at
Chat audio the same time.

© 2008 Lenovo

Intel High Definition Audio (Intel HD Audio)


Intel High Definition (HD) Audio is an evolutionary technology that replaces AC ’97. This next
generation architecture for implementing audio, modem, and communications functionality was
developed to enhance the overall user PC audio experience and to improve stability. Intel HD
Audio facilitates exciting audio usage models while providing audio quality that can deliver
consumer electronics levels of audio experience. The Intel HD Audio specification v1.0 was
released in June 2004.

AC ’97 Intel High Definition Audio


• Single stream (in and out) • Up to 15 input and 15 output streams at one time
and up to 16 channels per stream
• 8 channels with 32-bit output, 192 kHz multi-
• 6 channels with 20-bit output, 96 kHz stereo max
channel
– 12 Mb/s max
– 48 Mb/s (SDO), 24 Mb/s (SDI)
• Fixed bandwidth assignment • Dynamic bandwidth assignment
• AC ’97 DMAs: dedicated function assignment • DMAs: dynamic function assignment
• Codec enumeration at boot time (BIOS) • Codec enumeration done by software
(bus driver)

• Codec configuration limitation • No codec configuration limitation


• 12 MHz clock provided by primary codec • 24 MHz clock provided by the ICH6
• Driver software developed by audio codec supplier • OS native bus driver and Independent Hardware
Vendor value-added function driver

PC Architecture (TXW102) September 2008 37


Topic 1 - Introduction to Computer Hardware Architecture

Support for Intel HD Audio is found in the ICH6 and later I/O Controller Hubs. The ICH6 and
ICH7 integrate both AC ’97 and Intel HD Audio to facilitate transition from the older AC ’97;
however, only AC ’97 or Intel HD Audio can be used at one time. (Either requires an additional
external codec chip; when Intel HD Audio was announced, the older AC ’97 chips cost less
money.) The ICH6/ICH7 Intel HD Audio digital link shares pins with the AC ’97 link. For
input, the ICH6/ICH7 adds support from an array of microphones that can be used for enhanced
communication capabilities and improved speech recognition. The ICH8 only supports HD
Audio (not AC '97).
Intel HD Audio has support for a multi-channel audio stream, a 32-bit sample depth, and a
sample rate up to 192 kHz.
Intel HD Audio delivers significant improvements over previous-generation integrated audio
and sound cards. Intel HD Audio hardware is capable of delivering the support and sound
quality for up to eight channels at 192 kHz/32-bit quality, while the AC ‘97 specification can
only support six channels at 48 kHz/20-bit quality. In addition, by providing dedicated system
bandwidth for critical audio functions, Intel HD Audio is architected to prevent the occasional
glitches or pops that other audio solutions can have.
Dolby Laboratories selected Intel HD Audio to bring Dolby-quality surround sound
technologies to the PC, as part of the PC Logo Program that Dolby recently announced. The
combination of these technologies marks an important milestone in delivering quality digital
audio to consumers. Intel HD Audio will be able to support all the Dolby technologies,
including the latest Dolby Pro Logic IIx, which makes it possible to enjoy older stereo content
in 7.1-channel surround sound.

Standardized
Register Interface
(UAA)
Modem codec

OS Audio codec
ICH6
Audio Intel HD
Telephony codec
driver UAA Audio Intel HD
Intel HD
Modem bus driver registers Audio Link HDMI codec
Audio
driver
Controller
AC ’97
AC ’97 registers Audio Dock
AC ’97 AC Link Dock codec
drivers codec
Cntrl
Only Intel HD Audio or AC ’97 Modem
may be used at one time codec

Intel HD Audio Overview

Intel HD Audio also allows users to play back two different audio tracks, such as a CD and a
DVD simultaneously, which can not be done using current audio solutions. Intel HD Audio
features multi-streaming capabilities that give users the ability to send two or more different
audio streams to different places at the same time, from the same PC.

PC Architecture (TXW102) September 2008 38


Topic 1 - Introduction to Computer Hardware Architecture

Microsoft has chosen Intel HD Audio as the main architecture for their new Unified Audio
Architecture (UAA), which provides one driver that will support all Intel HD Audio controllers
and codecs. While the Microsoft driver is expected to support basic Intel HD Audio functions,
codec vendors are expected to differentiate their solutions by offering enhanced Intel HD audio
solutions.
Intel HD Audio also enables enhanced voice capture through the use of array microphones,
giving users more accurate speech input. While other audio implementations have limited
support for simple array microphones, Intel HD Audio supports larger array microphones. By
increasing the size of the array microphone, users get incredibly clean input through better noise
cancellation and beam forming. This produces higher-quality input to voice recognition, Voice
over IP (VoIP), and other voice-driven activities.
Intel HD Audio also provides improvements that support better jack retasking. The computer
can sense when a device is plugged into an audio jack, determine what kind of device it is, and
change the port function if the device has been plugged into the wrong port. For example, if a
microphone is plugged into a speaker jack, the computer will recognize the error and can
change the jack to function as a microphone jack. This is an important step in getting audio to a
point where it “just works.” (Users won’t need to worry about getting the right device plugged
into the right audio jack.)
The Intel HD Audio controller supports up to three codecs (such as an audio codec or modem
codec). With three Serial Data In (SDI) and one Serial Data Out (SDO) signals, concurrent
codec transactions on multiple codecs are made possible.
The SDO connects to all codecs and provides a bandwidth of 48 Mb/s. Each of the three SDIs
are typically connected to a codec and have a bandwidth of 24 Mb/s. In addition, the controller
has eight non-dedicated, multipurpose DMA engines (4 input, and 4 output). This allows
potential for full utilization of DMA engines for better performance than the dedicated function
DMA engines found in AC ’97. In addition, dynamic allocation of the DMA engines allows
link bandwidth to be managed effectively and enables the support of simultaneous independent
streams. This capability enables new exciting usage models (e.g., listening to music while
playing a multi-player game on the Internet).

Surround Surround
Center
Left Right Dolby Digital
DTS

DVD Video
CD

Subwoofer

With Intel HD Audio, a DVD movie with 5.1 audio can be sent
to a surround sound system in the living room, while you
listen to digital music and surf the Web on the PC.

PC Architecture (TXW102) September 2008 39


Topic 1 - Introduction to Computer Hardware Architecture

Intel HD Audio Codec On Desktop Systemboard

PC Architecture (TXW102) September 2008 40


Topic 1 - Introduction to Computer Hardware Architecture

Industry Standards:
Intel Active Management Technology (Intel AMT)

• As part of Intel vPro technology, Intel AMT enables secure,


remote management of systems
• Offers robust features for asset management, remote
management, and security
• Desktop and notebook systems
Out-of-Band system Allows remote management of platforms regardless
access of system power or OS state
Remote troubleshooting Significantly reduces desk-side visits, increasing the
and recover efficiency of IT technical staff
Proactive alerting Decreases downtime and minimizes time-to-repair
Remote hardware and Increase speed and accuracy over manual inventory
software asset tracking tracking, reducing asset accounting costs
Third-party nonvolatile Increased speed and accuracy over manual inventory
storage tracking, reducing asset accounting cost
Proactive blocking and Keep viruses and worms from infecting end-user PCs
reactive containment of and spreading, increasing network uptime
network threats

© 2008 Lenovo

Intel Active Management Technology (Intel AMT)


A major barrier to greater IT efficiency has been removed by Intel Active Management
Technology (Intel AMT), a feature on desktops (Intel Core 2 Processor with vPro Technology)
and notebooks (Intel Centrino with vPro Technology). Using built-in platform capabilities and
popular third-party management and security applications, Intel AMT allows IT to better
discover, heal, and protect their networked computing assets. Here's how:
• Discover: Intel AMT stores hardware and software information in non-volatile memory. With
built-in manageability, Intel AMT allows IT to discover the assets, even while PCs are
powered off. With Intel AMT, remote consoles do not rely on local software agents, helping
to avoid accidental data loss.
• Heal: The built-in manageability of Intel AMT provides out-of-band management capabilities
to allow IT to remotely heal systems after OS failures. Alerting and event logging help IT
detect problems quickly to reduce downtime.
• Protect: Intel AMT System Defense Feature protects your network from threats at the source
by proactively blocking incoming threats, reactively containing infected clients before they
impact the network, and proactively alerting IT when critical software agents are removed.
Intel AMT also helps to protect your network by making it easier to keep software and virus
protection consistent and up-to-date across the enterprise. Third-party software can store
version numbers or policy data in non-volatile memory for off-hours retrieval or updates.

PC Architecture (TXW102) September 2008 41


Topic 1 - Introduction to Computer Hardware Architecture

Intel AMT requires the computer system to have an Intel AMT-enabled chipset, network hardware and
software, as well as connection with a power source and a corporate network connection. With regard to
notebooks, Intel AMT may not be available or certain capabilities may be limited over a host OS-based
VPN or when connecting wirelessly, on battery power, sleeping, hibernating, or powered off. For more
information, see www.intel.com/technology/manage/iamt.

Version Year Key Features

AMT 1.0 2005 • Hardware inventory

AMT 2.1 2006 • System defense


• Wake on LAN
• USB Provisioning

AMT 2.5 2006 • Notebook support

AMT 2.6/3.0 2007 • Remote configuration

AMT 4.0 2008 (notebooks) • WS-MAN and DASH 1.0

AMT 5.0 2008 (desktops) • WS-MAN and DASH 1.0

Dash 1.0 Dash 1.1 AMT 2.6/3.0 AMT 4.0 AMT 5.0

Boot Control X X X X X

Power State Management X X X X X

Hardware Inventory X X X X X

Software Inventory X X X X X

Hardware Alerting X X X X X

Serial Over LAN X X X X

IDE Redirect X X X X

Non Volatile Memory X X X X

Agent Presence X X X

Remote Configuration X X X

Enhanced System Defense X X X

Audit Logs X X

Wireless Management in X N/A


Sleep States

Microsoft NAP / Cisco SDN X X

Client Initiated Remote Access X X


(wired)

Measured AMT X X
Enhanced System Defense X X

KVM X

PC Architecture (TXW102) September 2008 42


Topic 1 - Introduction to Computer Hardware Architecture

Benchmarks

• Understand benchmark objective:


either application throughput or
subsystem performance
• Examples include:
- BAPCo
ƒ SYSmark 2004 SE
ƒ SYSmark 2007 Preview
- MobileMark 2007
Lenovo ThinkCentre Desktop
- 3DMark05 and 3DMark06
- SPEC CPU2000
PC performance
doubles every
two years.

© 2008 Lenovo

Benchmarks
The following is a short list of benchmarks and the systems they measure.
• Overall performance:
– SYSmark 2004 SE - SYSmark includes office productivity and Internet content creation
benchmark tests. The two scores are combined and given a weighted average to produce an
overall performance rating. Both SYSmark tests derive scores by using real-world
applications to run a preset script of user-driven workloads and usage models developed by
application experts.
The SYSmark 2004 SE Internet Content Creation test is organized as scenarios that are
designed to simulate an Internet content creator’s day. This benchmark incorporates such
applications as Adobe Photoshop 7.01, Discreet 3ds max 5.0, and Macromedia
Dreamweaver MX.
The SYSmark 2004 SE Office Productivity test follows ICC’s blueprint by mimicking the
usage patterns of today’s desktop and mobile business users, including the concurrent
execution of multiple programs. Applications such as Adobe Acrobat 5.0.5, McAfee
VirusScan 7.0, and the Microsoft Office suite are used. Each SYSmark test measures the
response time of the application to user input. Both scores are combined using a geometric
mean to get an overall score.

PC Architecture (TXW102) September 2008 43


Topic 1 - Introduction to Computer Hardware Architecture

The SYSmark benchmarks are created by BAPCo (Business Applications Performance


Corporation), which is a nonprofit corporation founded in May 1991 to create objective
performance benchmarks that are representative of the typical business environment. For
notebook systems, MobileMark 2005 is a benchmark created by BAPCo that measures both
performance and battery life at the same time using popular applications. Contact
www.bapco.com for more information.
– SYSmark 2007 Preview – In 2007, BAPCo released SYSmark 2007 Preview. This
benchmark extends the SYSmark family to support Windows Vista. SYSmark 2007
Preview allows users to directly compare platforms based on Windows Vista to those based
on Windows XP Professional and Home.
– MobileMark 2007 – MobileMark 2007 is the latest version of the premier notebook battery
life and performance under battery life metric based on real world applications.
• Graphics performance
– 3DMark03, 3DMark05, 3DMark06, and 3DMarkMobile06 are benchmark tests that run
through different scenes using various DirectX or OpenGL calls to derive a score reflecting
the graphics hardware and driver performance. See www.futuremark.com/products/ for
more information.
• Battery life
– Business Winstone BatteryMark (BWS BatteryMark) 2004 measures the battery life of
notebook computers, providing users with a good idea of how long a notebook battery will
hold up under normal use. This benchmark uses the same workload as in Business
Winstone 2004.

Notable benchmark organizations


• In 1988, the Transaction Processing Council (TPC) was formed to fulfill the need for
transaction processing benchmarks that emulate the workloads found on database servers.
The council includes representatives from a cross-section of 45 hardware and software
companies that meet to establish benchmark content. A primary goal of the council is to
provide objective and verifiable performance data to the industry. Visit www.tpc.org for
more information.
• The Standard Performance Evaluation Corporation (SPEC) establishes, maintains, and
endorses a standardized set of relevant benchmarks and metrics for performance evaluation.
Contact www.specbench.org for more information.
– SPEC CPU2000 - Introduced in late 2000, SPEC (Standard Performance Evaluation Corp.)
CPU2000 is a workstation application-based benchmark program that can be used across
several versions of Microsoft Windows NT, Windows 2000, and Unix. It consists of the
two benchmark suites listed below. Both measure the real-world performance of a
computer’s processor, memory architecture, and compiler. These replace CPUmark and
FPUmark:
• SPECINT2000 measures computation-intensive integer performance
• SPECFP2000 measures computation-intensive floating point performance.

PC Architecture (TXW102) September 2008 44


Topic 1 - Introduction to Computer Hardware Architecture

Security Issues:
Trusted Platform Module (TPM)

• Security chip to enhance security of PC


• Select Lenovo ThinkPad and ThinkCentre
systems have a TPM
• Used with ThinkVantage Client Security Solution

ThinkPad security chip

ThinkCentre security chip

© 2008 Lenovo

Trusted Platform Module


A Trusted Platform Module (TPM) is a special-purpose integrated circuit (IC) built into a
variety of platforms to enable strong user authentication and machine attestation—essential to
prevent inappropriate access to confidential and sensitive information and to protect against
compromised networks. Trusted Platform Modules utilize open standards and technologies to
ensure interoperability of diverse products in mixed-vendor environments.
The TPM is based on specifications developed by the Trusted Computing Group (TCG). The
TCG is an industry standards group formed to develop, define, and promote open standards for
trusted computing and security technologies, including hardware building blocks and software
interfaces, across multiple platforms, peripherals, and devices. Members include Microsoft,
Intel, Dell, HP, IBM, and Lenovo. Current TPM implementations are based on the TCG 1.2
specification. See www.trustedcomputinggroup.org for more information.

PC Architecture (TXW102) September 2008 45


Topic 1 - Introduction to Computer Hardware Architecture

How a Trusted Platform Module (TPM) Secures A System

The logon user name and TPM When files and folders are encrypted, they can
password are stored in the chip be decrypted only by the person who has the
TPM chip. authentication data the TPM chip requires.

User ID abcdefg

Password Disk
*******

Encrypted Files
Smart-card
reader

Fingerprint and smart-card readers can


be used in addition to the TPM for an
additional security layer.
Anyone trying to hack into your system
Fingerprint reader
will not be able to read any of the
encrypted files or folders.

TPM
chip

TPM chip Encrypted files

If a disk is removed, the encrypted data is safe


because it can not be decrypted without being
authenticated through the TPM.

User Authentication without TPM User Authentication with TPM


Inadequate user ID and password protection makes Strong protections eliminate "spoofing"; verifies
"spoofing" very easy integrity of user login credentials

Multiple login IDs and passwords cause users to be On-chip, protected storage of secrets reduces user
careless; store secrets without protecting them; use burden; enables secure single sign-on; ensures strong
weak protections protections

Storage of IDs and passwords in easily copied files; Secure storage of IDs and passwords; multiple log-in
use of one set of secrets for access to all systems secrets secured by the TPM

Platform Attestation without TPM Platform Attestation with TPM


Easy to change settings and parameters for Secure access prevents unauthorized access; secure
unauthorized access and malicious damage hash comparison validates settings

Altered settings allow inappropriate access to valued Validated settings ensure system integrity and prevent
networks and sensitive data inappropriate access

Untrustworthy systems result in unreliable and Trustworthy systems result in reliable and trustworthy
untrustworthy practices practice; reduce support expenses

PC Architecture (TXW102) September 2008 46


Topic 1 - Introduction to Computer Hardware Architecture

ThinkCentre Security Chip


(2004 to 2006)

Security chip was integrated in Super I/O on systemboard as the National


Semiconductor PC8375T or Winbond PC8375S (TPM 1.2-compliant) Super I/O for
security chip, Asset ID, serial, parallel, diskette, keyboard, mouse, auto-thermal
controls, hardware event log (in select ThinkCentre desktops)

ThinkCentre Security Chip


(2007 to 2008)

Security chip in Winbond WPCT200 SafeKeeper Trusted Platform Module (in select
ThinkCentre desktops)

ThinkCentre Security Chip


(2008 to 2009)

Security chip integrated in some desktop-based Intel I/O Controller Hub 10 (ICH10D
and ICH10DO) in select ThinkCentre desktops

PC Architecture (TXW102) September 2008 47


Topic 1 - Introduction to Computer Hardware Architecture

ThinkPad Security Chip


(2005 to 2007)

Security chip was soldered to systemboard and was a module from Atmel
mainly used in select ThinkPad notebooks

ThinkPad Security Chip


(2008 to 2009)

Security chip integrated in notebook-based Intel I/O Controller Hub 9 (ICH9M


and ICH9M-Enhanced) in select ThinkPad notebooks

PC Architecture (TXW102) September 2008 48


Topic 1 - Introduction to Computer Hardware Architecture

ThinkVantage Client Security Solution


ThinkVantage Client Security Solution is a unique hardware-software combination that helps protect
your company information, including vital security information like passwords, encryption keys and
electronic credentials, while helping to guard against unauthorized user access to data.
This level of security is critical for both desktop and notebook systems. In fact, you cannot get a
higher level of security on select ThinkPad notebooks or ThinkCentre desktops as a standard feature
on a PC from any other manufacturer. Key features include:
• Active Directory support for seamless configuration and management support
• Completely new and easier to use password manager with broader browser support, auto-
recognize/fill, and per-site security policies
• Multi-factor support and policy manager for improved security
• Updated fingerprint reader software with integrated tutorial
ThinkVantage Client Security Solution helps turn your computer into a highly protected vault.
Available preloaded or by download for all ThinkPad notebooks and ThinkCentre desktops, it
provides advanced technology for user authentication plus enhanced security for wired and wireless
networking.
To further enhance security, select ThinkPad and ThinkCentre systems also include an embedded
Trusted Platform Module (TPM). ThinkVantage Client Security Solution works in conjunction with
this chip to manage encryption keys and processes.
The TPM itself is isolated from the operating system using patented tamper-resistant technology. To
initialize the TPM, users or administrators simply download the Client Security Solution and run a
single setup wizard to install the software and create a set of master encryption keys.

PC Architecture (TXW102) September 2008 49


Topic 1 - Introduction to Computer Hardware Architecture

Summary:
Computer Hardware Architecture

• Netbooks, notebooks, desktops, PC


Blades, and servers are different types
of computers with unique features.
• Each computer subsystem is connected
by buses or links.
• Intel-based PCs adhere to industry
standards, and industry benchmarks
provide subsystem and total system
performance comparisons. Lenovo ThinkCentre A57 Small with
• Security can be enhanced with the dual Lenovo ThinkVision monitors
Trusted Platform Module and
ThinkVantage Client Security Solution.

Lenovo ThinkPad
© 2008 Lenovo

PC Architecture (TXW102) September 2008 50


Topic 1 - Introduction to Computer Hardware Architecture

Computer Measurements
• Bit (b): on charge or off charge (0 or 1)
• Byte (B): eight bits (a character/number represented by a byte)
• Kilobyte (2^10): one thousand bytes (KB): 1,024
• Megabyte (2^20): one million bytes (MB): 1,048,576
• Gigabyte (2^30): one billion bytes (GB): 1,073,741,824
• Terabyte (2^40): one trillion bytes (TB): 1,099,511,627,776
• Petabyte (2^50): one quadrillion bytes (PB): 1,125,899,906,842,624
• Exabyte (2^60): 1,000 petabytes (EB): 1,152,921,504,606,846,976
• Zettabyte (2^70): 1,000 exabytes (ZB): 1,180,591,620,717,411,303,424
• Yottabyte (2^80): 1,000 zettabytes (YB): 1,208,925,819,614,629,174,706,176
• Millisecond: one thousandth of a second (ms): 1/1,000
• Microsecond: one millionth of a second (us): 1/1,000,000
• Nanosecond: one billionth of a second (ns): 1/1,000,000,000
• Picosecond: one trillionth of a second (ps): 1/1,000,000,000,000
• Megahertz: millions of cycles per second (MHz)
• Gigahertz: billions of cycles per second (GHz); 1GHz=1,000 MHz

PC Architecture (TXW102) September 2008 51


Topic 1 - Introduction to Computer Hardware Architecture

Clock Cycle Times


The amount of time for a single clock cycle:
12 MHz = 83.3 ns
33 MHz = 30 ns
50 MHz = 20 ns
66 MHz = 15 ns
100 MHz = 10 ns
133 MHz = 7.5 ns
150 MHz = 6.6 ns
166 MHz = 5.9 ns
200 MHz = 5.0 ns
233 MHz = 4.3 ns
266 MHz = 3.75 ns
300 MHz = 3.3 ns
400 MHz = 2.5 ns
533 MHz = 1.875 ns
800 MHz = 1.125 ns
Cycle Speed Formula
1 ÷ frequency (in hertz) = speed of one bus cycle in seconds (multiply by ten to the power of nine
to convert to nanoseconds). For example:
1 ÷ 10 million = 10 power -7 = 100 nanoseconds
Calculate Date Transfer Rates
Frequency (in MHz) × data width (in bits) = maximum data bus capacity in Mb/s (÷ 8 to convert to
MB/s). For example:
10 MHz × 32-bits = 320 Mb/s or 40 MB/s

PC Architecture (TXW102) September 2008 52


Topic 1 - Introduction to Computer Hardware Architecture

Speeds
bps = bits per second
KB/s = kilobytes per second
Mb/s = megabits per second
MB/s = megabytes per second
Gb/s = gigabits per second
GB/s = gigabytes per second

• Some magazines (like PC Magazine) measure in megabytes (1,024 bytes squared, or 1,048,576).
For rough conversion, subtract five million from every hundred million bytes. For example, 720
million bytes is 687 megabytes.
• Speeds for connectivity are normally measured in bits (bits per second). Speeds for transfers
within a PC (like across PC buses) are normally measured in bytes (bytes per second).

5 MB/s AT Bus (ISA)


132 MB/s 33 MHz at 32-bit (PCI bus)
528 MB/s 66 MHz at 64-bit
800 MB/s 100 MHz at 64-bit
1,064 MB/s 133 MHz at 64-bit (Pentium II 266/66MHz to 1/2 speed L2 cache)
1,200 MB/s 150 MHz at 64-bit
1,600 MB/s 200 MHz at 64-bit (or 128-bit bus at 100MHz)
1,600 MB/s 800 MHz at 16-bit
3,200 MB/s 400 MHz at 64-bit
4,200 MB/s 533 MHz at 64-bit
6,400 MB/s 200 MHz at 256-bit
6,400 MB/s 800 MHz at 64-bit

• For each 20 degrees Celsius-increase in operating temperature, electronic component life drops in
half.
• A billion bits is equivalent to 62,500 double-spaced typewriter pages--enough paper to stack 21
feet high.
• 0.15 micron is 1/600 the width of one human hair.
• Logic circuits are used to process information; memory circuits store information.

PC Architecture (TXW102) September 2008 53


Topic 1 - Introduction to Computer Hardware Architecture

Network Speeds

Home Networking Average Usable


Data Rate
Technologies Throughput
Cable or DSL Internet connection 512 Kb/s-6 Mb/s 150 Kb/s-2 Mb/s

Phone-line (HPNA 2.0) and 802.11b 10 Mb/s 5 Mb/s

Power-line (HomePlug) 14 Mb/s 5 Mb/s

802.11a & 802.11g 54 Mb/s 25 Mb/s

Pre-802.11n 108 Mb/s 50 Mb/s

Fast Ethernet 100 Mb/s 60 Mb/s

Gigabit Ethernet 1,000 Mb/s 600 Mb/s

Average Required
Applications
Throughput
Video

Uncompressed full-motion video stream 1,000-2,000 Mb/s

Compressed HDTV 18 Mb/s

Compressed standard / extended definition TV stream 7-15 Mb/s

Audio

Radio-quality MP3 or WMA stream 64 Kb/s

CD-quality MP3 or WMA stream 128 Kb/s

Dolby AC-3 stream 640 Kb/s

Other

VolP traffic 64 Kb/s

Typical broadband surfing traffic 1-256 Kb/s

E-mail traffic 30-100 Kb/s

Broadband Types
Business
Business DSL Fractional T1 Wired MAN Wireless MAN
cable
Speed
2-5 Mb/s 144 Kb/s-6 Mb/s 384 Kb/s-1.5 Mb/s 10 Mb/s-1 Gb/s 256 Kb/s-100 Mb/s
(download/
384-768 Kb/s 144 Kb/s-1.5 Mb/s symmetrical symmetrical symmetrical
upload)

Fast, cost- No cabling, instant


Pros Inexpensive Inexpensive Reliable
effective installation

Shared media, Limited


Cons Limited bandwidth Expensive Limited availability
limited bandwidth availability

PC Architecture (TXW102) September 2008 54


Topic 1 - Introduction to Computer Hardware Architecture

Review Quiz

Objective 1

1. What type of system would most likely market its systems management support, chipset, and
graphics performance?
a. Notebook
b. Desktop
c. PC Blade
d. Netbook

2. What type of system has the user's processor, memory, disk, and graphics removed from a
user's desk and stored in a rack in a secure, centralized location?
a. Notebook
b. Desktop
c. PC Blade
d. Server

3. The ThinkPad brand name is associated with what type of computer system?
a. PDA
b. Notebook
c. Desktop
d. Server

4. Rescue and Recovery, Access Connections, and Productivity Center are from what strategic
Lenovo offering?
a. ThinkVantage Technologies
b. ThinkVision monitors
c. ImageUltra Builder
d. System Migration Assistant

Objective 2

5. A device driver is an interface between what subsystems?


a. The applications and BIOS
b. Hardware and the operating system
c. The operating system and the BIOS
d. An API and the standard hardware

6. What circuitry controls the methods, manner, and speed in which a subsystem is accessed?

a. Controller
b. Keyboard
c. VLSI logic
d. Device driver

PC Architecture (TXW102) September 2008 55


Topic 1 - Introduction to Computer Hardware Architecture

7. Most transfers of data between subsystems involve which buses?


a. Data bus
b. Data and address bus
c. Data, address, and control bus
d. Data, address, control, and tag bus

Objective 3

8. What is the name of the architecture for implementing audio, modem, and communications
functionality after AC ’97?
a. Intel High Definition Audio (Intel HD Audio)
b. I/O Controller Hub 6 (ICH6)
c. Unified Audio Architecture (UAA)
d. Dolby Digital

9. What is an important characteristic of a performance benchmark?


a. Understanding if it measures a subsystem or application throughput
b. The benchmark needs to incorporate Java
c. Benchmark must include MPEG-2 encoding
d. The PCI bus must be enabled

10. What industry standard addresses the restriction of hazardous substances in electrical and
electronic equipment?
a. EPEAT
b. ENERGY STAR
c. Restriction of Hazardous Substances (RoHS)
d. 80 PLUS

11. What industry standard promotes technologies that improve the efficiency of a computer's
power delivery and reduce the energy consumed in an inactive state?
a. Climate Savers
b. 80 PLUS
c. Intel HD Audio
d. RoHS

Objective 4

12. What security solution is supported on all Lenovo ThinkPad and ThinkCentre systems that
protects vital security information and guards against unauthorized user access to data?

a. ThinkVantage Client Security Solution


b. Trusted Platform Module
c. Integrated fingerprint reader
d. Utimaco SafeGuard Easy

PC Architecture (TXW102) September 2008 56


Topic 1 - Introduction to Computer Hardware Architecture

Answer Key
1. B
2. C
3. B
4. A
5. B
6. A
7. C
8. A
9. A
10. C
11. A
12. A

PC Architecture (TXW102) September 2008 57


Topic 1 - Introduction to Computer Hardware Architecture

PC Architecture (TXW102) September 2008 58


Topic 2 - Processor Architecture

PC Architecture (TXW102)
Topic 2:
Processor Architecture

© 2008 Lenovo

PC Architecture (TXW102) September 2008 1


Topic 2 - Processor Architecture

Objectives:
Processor Architecture

Upon completion of this topic, you will be able to:

1. Define important processor features and functions


2. Recognize the packaging used with Intel processors
3. List the current Intel desktop processors and their main features
4. List the current Intel notebook processors and their main features

© 2008 Lenovo

PC Architecture (TXW102) September 2008 2


Topic 2 - Processor Architecture

Processor Features:
Introduction

• All work performed directly


or indirectly by processor
Processor +
• Most important element of L1 cache
system performance L2 cache
PCI Express Memory and optional
x16 slot graphics controller
• Support only specific PCI Express slots
MCH or
GMCH
operating systems host bridge Memory
Direct
Media
Interface PCIe controller
I/O PCI controller
PCIe Controller SATA controller
Hub IDE controller
(ICH)
USB controller
4 SATA disks
Super I/O USB 2.0
Firmware AC '97 codecs
hub or
Low Pin
Silicon wafer High Definition Audio
Count interface
with processors

© 2008 Lenovo

Processors
The following points outline fundamental information applicable to all processors.
• The processor (or microprocessor) is the central processing unit (CPU) of the computer. The
processor is the place where most of the control and computing functions occur. All operating
system and application program instructions are executed here. Most information passes through
the processor, whether a keyboard stroke, data from a disk, or information from a communication
network.
• The processor needs data and instructions for each processing operation that it performs. Data
and instructions are loaded from memory into data-storage locations, known as registers, in the
processor. Registers are also used to store the data that results from each processing operation
until the data is transferred to memory.
• The processor is packaged as an integrated circuit that contains the following:
– one or more arithmetic logic units (ALUs or execution units)
– a floating point unit (math coprocessor)
– Level 1 and Level 2 cache
– registers for holding instructions, data, and control circuitry

PC Architecture (TXW102) September 2008 3


Topic 2 - Processor Architecture

• Clock rate is a fundamental characteristic of all microprocessors. Clock rate is the rate at
which a processor perform operations, and this rate is measured in billions of cycles per
second or gigahertz (GHz). The maximum clock rate of a microprocessor is determined by
how fast the internal logic of the chip can be switched. As silicon fabrication processes are
improved, the integrated devices on chips become smaller and can be switched faster. Thus,
the clock speed can be increased.
• Processors support specific operating systems. For example, the Itanium family of processors
used in some high-end servers require an operating system that is specifically written for
them. Itanium processors will not run Windows XP which is for IA-32 (32-bit) processors.
Processors used in notebooks and desktops today are 32-bit/64-bit processors (also known as
IA-32 or Intel Architecture 32-bit) and typically run Windows-based operating systems such
as Windows XP. Linux also runs on 32-bit/64-bit processors. IA-32 processors with support
for Intel 64 Technology can run 64-bit software.

PC Architecture (TXW102) September 2008 4


Topic 2 - Processor Architecture

Energy-Efficient Processor Performance


Performance usually refers to the amount of time it takes to execute a given application or task,
or the ability to run multiple applications or tasks within a given period of time. Contrary to
popular misconception, it is not clock frequency (GHz) alone or the number of instructions
executed per clock cycle (IPC) alone that equates to performance. True performance is a
combination of both clock frequency (GHz) and ICP. Performance can be computed as a
product of frequency and instructions per clock cycle:
Performance = Frequency x Instructions per Clock Cycle
This shows that the performance can be improved by increasing frequency, IPC, or possibly
both. It turns out that frequency is a function of both the manufacturing process and the micro-
architecture. At a given clock frequency, the IPC is a function of processor micro-architecture
and the specific application being executed. Although it is not always feasible to improve both
the frequency and the IPC, increasing one and holding the other close to constant with the prior
generation can still achieve a significantly higher level of performance.
It is also possible to increase performance by reducing the number of instructions that it takes to
execute the specific task being measured. Single Instruction Multiple Data (SIMD) is a
technique used to accomplish this. Intel first implemented the 64-bit integer SIMD instructions
in 1996 on the Intel Pentium processor with MMX technology and subsequently introduced
128-bit SIMD single precision floating point, or Streaming SIMD Extensions (SSE), on the
Pentium III processor and SSE2 and SSE3 extensions in subsequent generations. Another
innovative technique that Intel introduced in its mobile micro-architecture is called microfusion.
Intel's microfusion combines many common micro-operations or micro-ops (instructions
internal to the processor) into a single micro-op, such that the total number of micro-ops that
need to be executed for a given task is reduced.
It has also become important to look at delivering optimal performance combined with energy
efficiency – to take into account the amount of power the processor will consume to generate
the performance needed for a specific task. Here power consumption is related to the dynamic
capacitance (the ratio of the electrostatic charge on a conductor to the potential difference
between the conductors required to maintain that charge) required to maintain IPC efficiency
times the square of the voltage that the transistors and I/O buffers are supplied with times the
frequency that the transistors and signals are switching at. This can be expressed as:
Power = Dynamic Capacitance x Voltage x Voltage x Frequency
Taking into account this power equation along with the previous performance equation,
designers can carefully balance IPC efficiency and dynamic capacitance with the required
voltage and frequency to optimize for performance and power efficiency.

PC Architecture (TXW102) September 2008 5


Topic 2 - Processor Architecture

Processor Definitions
Backside bus – In the P6 micro-architecture, which had a dual independent bus, the backside bus
was the 64-bit or 256-bit bus to L2 cache. The backside bus was independent of the frontside bus to
the memory controller.
Branch – A point that represents a potential change in the flow of a program. A branch can either
be unconditional, meaning that it always changes program flow, or it can be conditional, meaning
that it may or may not change program flow depending on other factors.
Branch Prediction – The Branch Processing Unit improves processor performance by trying to
predict the address of branches and preload the address in a buffer (Branch Target Buffer) should
the branch occur. The Branch Target Buffer can be dynamic (based on previous history of
branches) or static (based on a hardwired set of rules). The Branch Processing Unit has an 80% hit
rate (predicts branch and address accurately). When an instruction leads to a branch, the Branch
Target Buffer remembers the instruction and the address of the branch taken. The branch
instruction is a program statement that has the potential of altering the execution flow of the
program such as a "Jump" or "Loop" instruction. Branch instructions change the sequence of
instruction execution, which causes pipelines and buffers to clear and reload with new instructions
(which slows performance). Programs normally have 20% of instructions as branch instructions.
Branch mispredictions happen more frequently with “branchy” productivity applications than with
streaming multimedia applications.
Branch Processing Unit – Dedicated circuitry is used to prevent delays caused by branching
instructions. The branch processing unit removes branch instructions from the queue, calculates the
target address, and sends the new target address to the cache for prefetching. It tries to keep
instructions flowing into the queue even when program logic dictates a change in the base address
used for prefetching instructions from cache.
Capacitor – An electric circuit element used to store charge temporarily, consisting in general of
two metallic plates separated and insulated from each other by a dielectric. Higher wattage
processors and chipsets require more capacitors on a systemboard.
Complex Instruction Set Computing (CISC) – Processors that are microcode based with
instructions that vary in length. Intel x86 processors are CISC processors.
Core micro-architecture – A micro-architecture introduced on the Core 2 Duo and Core 2 Extreme
processors in July 2006. Key features of Intel Core micro-architecture include Intel Wide Dynamic
Execution, Intel Intelligent Power Capability, Intel Advanced Smart Cache, Intel Smart Memory
Access, and Intel Advanced Digital Media Boost.
Decode – To examine the format of an instruction to determine what type of operation and which
operands it specifies.
Dependency – A condition that prevents one instruction from executing until another instruction
has been finished. A true dependency occurs when one instruction's output becomes another
instruction's input. False dependencies may be introduced by out-of-order execution but can be
worked around through register renaming.
Die area – A small die size means lower costs since more chips can be manufactured from a single
wafer.

PC Architecture (TXW102) September 2008 6


Topic 2 - Processor Architecture

Dual-core – A processor that combines two independent processors and their caches onto a single
silicon chip.
Dual Independent Bus (DIB) – Two independent buses make up the dual independent bus (DIB)
architecture: the “L2 cache bus” (backside bus) and the processor to main memory “system bus”
(frontside bus). Both buses can be used simultaneously and in parallel, rather than in a singular
sequential manner as in a single bus system.
Execution unit – A portion of the processor dedicated to performing a particular type of operation
such as arithmetic functions, memory loads and stores, or branch processing.
Fetch – To retrieve code from cache or memory in preparation for decoding.
Field Effect Transistor (FET) – A transistor in which the output current is controlled by a variable
electric field. Higher wattage processors and chipsets require more FETs on a systemboard.
Capacitor Toroid Field effect transistor

Electrical Devices on ThinkCentre Systemboard

Floating point – A system for representing any number as a single digit with a decimal fraction
along with a multiplier (an exponent of two or some other base number); for example, 1.23 x 10 to
the 4th.
Frontside bus – In the P6 micro-architecture, which had a dual independent bus, the frontside bus
was the 64-bit data bus from the processor core to the memory controller. The frontside bus was
also called the system bus and was independent of the backside bus to L2 cache.
High-K – In 2007, Intel released processors with 45 nm technology which changed out critical
materials in a redesign. It moved from polysilicon dioxide gate to metal gate technology. The metal
gate sits on an insulator made of hafnium-based high-k. The high-k refers to a material with a high
dielectric constant (k) (as compared to polysilicon dioxide) used in the manufacturing process.

PC Architecture (TXW102) September 2008 7


Topic 2 - Processor Architecture

IA-32 Execution Layer – In January 2004, Intel launched its IA-32 Execution Layer software,
which will let companies run 32-bit applications on servers using 64-bit Itanium 2 processors. Intel
first built 32-bit compatibility into the Itanium 2 processor itself; switching it to the software layer
will let 32-bit applications run more efficiently on Itanium 2 and eliminate the need to further
develop 32-bit compatibility at the chip level. IA-32 Execution Layer runs with Windows
applications. Intel plans to work with Red Hat, Inc. to support 32-bit Linux applications.
Instruction – The fundamental unit of a program; one or more bytes of information that direct the
processor to perform a particular task.
Integer instruction versus floating point instruction – Text processing is considered an integer
instruction since text (and numbers) are both bit patterns (00110101).
Load/Store Unit – The Load/Store unit loads and stores instructions between registers and caches. It
accesses memory using dedicated instructions that load a value into a register from cache or store a
value from a register into cache. This eliminates the need for a separate address computation for
each memory access. The Load/Store unit originally started as a RISC feature.
Micro-architecture – Micro-architecture refers to the implementation of processor architecture in
silicon. Within a family of processors, the micro-architecture is often enhanced over time to deliver
improvements in performance and capability while maintaining compatibility to the architecture.
Micro-op – Micro-op is Intel's term for RISC-like internal operations into which x86 instructions
are translated to improve processing efficiency. Micro-ops are easier to dispatch and execute in
parallel than their complex x86 instruction counterparts, which maintains compatibility with the
existing x86 instruction set but overcomes their historical limitations.
Microcode – A series of micro-instructions (sets of control bits) used to coordinate the execution of
complex instructions by breaking them into smaller segments.
Micron – A measurement of width; one micron is one millionth of a meter; human hair is between
50 and 100 microns; 0.18 microns is about 600 times thinner than the width of a hair. Decreasing
the micron width produces the following benefits:
• Increase MHz/GHz because the chip is smaller
• Reduces production costs because each wafer hold more chips
• Cuts power consumption because the chips operate at lower voltage
• Allows more transistors
• Generates less heat
Nanometer (nm) – Processors and other semiconductors are manufactured on a circular silicon
wafer. Over time, the process technology for the wafer size is reduced. It is measured in
nanometers. In 2006, the common size was down to 65 nanometers. This width is the smallest
circuit wire diameter.
NetBurst micro-architecture – NetBurst micro-architecture is micro-architecture first introduced in
the Pentium 4 in 2000 and used in subsequent Intel processors (such as Intel Pentium D and Xeon).
NetBurst includes hyper-pipelined technology, rapid execution engine, and execution trace cache.

PC Architecture (TXW102) September 2008 8


Topic 2 - Processor Architecture

Nonblocking – Allowing subsequent operations to proceed even if the first cannot be satisfied
completely. The term describes caches that can continue processing even after a miss, which occurs
when requested data is not present in the cache.
Out-of-order execution (OoO) – Allows multiple objects in different execution units to be
processed out of sequence. The instructions can then be recombined into the proper program flow
before being written to off chip memory. The area on the chip and the power to run the control
logic for OoO increase both chip cost and power consumption.
P6 micro-architecture – A micro-architecture introduced on the Pentium Pro in 1995 and used in
the Pentium II, Pentium III, Pentium II Xeon, Pentium III Xeon, and some Celeron processors. This
architecture defined how instructions were executed. P6 micro-architecture consisted of three major
units (fetch/decode, dispatch/execute, and retire unit) and a dual independent bus (a backside bus to
L2 cache and a frontside bus to the memory controller).

100 MHz or 133 MHz Dual independent bus

256 KB, 512 KB, 1 MB, or 2 MB


L2 cache External or internal
64-bit
64-bit or 256-bit
Frontside bus L2 cache bus Half speed or full speed
(system bus) (backside bus)

16 KB data cache 16 KB inst cache

FPU Load Store Intr/MMX Intr/MMX

P6 Micro-Architecture of Pentium Pro, Pentium II and III,


Pentium II and III Xeon, early Celeron Processors

Pipeline – An "assembly line" design where instruction processing is broken into many small steps
or stages that are handled by separate circuits. When an instruction completes one stage, it
progresses to the next, and the earlier stage begins work on the subsequent instruction. The
processor has the capability of processing multiple instructions simultaneously at varying stages of
execution to obtain an overall instruction execution rate of one instruction per clock cycle. A higher
number of stages in the pipeline allow it to reach a faster clock speed; the disadvantage is the entire
pipeline must be flushed and reloaded during a branch miss (common with business applications)
effectively wasting processor cycles.
Privilege levels – Intel x86 processors support four privilege levels that are referred to as rings 0
through 3. Kernal mode is synonymous with ring 0. Applications run in ring 3. The rings provide
hardware-based protection mechanisms as the hardware prevents programs that are running in less
privileged rings from overwriting the contents of memory controlled by more privileged programs.

PC Architecture (TXW102) September 2008 9


Topic 2 - Processor Architecture

Processor architecture – Processor architecture refers to the instruction set, registers, and memory
data-resident data structures that are public to a programmer. Processor architecture maintains
instruction set compatibility so processors will run code written for processor generations past,
present, and future.
Reduced Instruction Set Computing (RISC) – Processors without microcode but with hardwired
instructions. RISC processors have uniform instruction lengths.
Register – A small, high-speed, on-chip storage area in a processor. Registers are the fastest type of
storage used by a processor. The more registers and the more bits each register holds, the more
compilers can optimize application performance. Registers are usually designed for certain tasks
such as floating-point, address, general-purpose, special-purpose, etc.
Retire – To commit results to architectural registers and memory. Instructions that are executed
speculatively cannot be retired until it is certain that all dependencies have
been resolved.
Serialize – To force the processor to stall the issuing of instructions until a particular instruction has
been retired. Serialization is required by certain instructions that cannot be processed out of order.
Speculative execution – An enhancement of branch prediction that speculatively executes the
predicted branch. If the branch is wrong, the processing must be corrected, but if the branch is
correct, the processor is further ahead. This method keeps objects in the pipeline without stalling
and waiting for a branch to be resolved.
Stall – To halt a portion of a pipeline for one or more clock cycles.
Super-pipelined – Having a pipeline substantially deeper than the usual five or six stages. Super-
pipelined designs tend to allow higher clock speeds than other pipelined designs.
Superscalar – The ability to process multiple instructions in a clock cycle. The Pentium processor
and after are superscalar due to their dual-pipeline structure. Both the 386 and i486 are scalar since
they only have one pipeline. Pentium has two execute units (integer units). Other processors have
multiple execution units.
Toroid – An electrical component that regulates electrical power to a device; looks like a doughnut
with wires around it. Higher wattage processors and chipsets require more toroids on a
systemboard.
Transistor – A device to open and close a circuit in a processor. Like a light switch on the wall that
lets electricity go to a light bulb, a transistor performs like a simple switch, either allowing or
preventing current to flow through. In general, a processor with more transistors is more powerful.
More transistors also make a processor run hotter. The smaller the transistor, the quicker the chip
can move electrons between them and the faster it can perform each calculation resulting in better
performance.
Voltage Regulator Module (VRM) – Processors and/or L2 cache may utilize a voltage regulator
module, which allows different processor voltages to be supported by interchangeable VRMs.
VRMs are physical circuitry that plug in to a socket near the processor's socket/slot.

PC Architecture (TXW102) September 2008 10


Topic 2 - Processor Architecture

Processor Features:
Math Coprocessors

• Also called Floating Point Unit (FPU)


• Accelerates numeric intensive calculations
• Applications must be specifically written to use
- Spreadsheets, CAD,
BTB and I-TLB
statistical analysis,
vector graphics, and Microcode
x86 Instruction Decoder

other numeric ROM Execution Trace Cache BTB


applications 3
Rename and Allocate
• Operating systems and 3

network operating Micro-op Queues


L2
systems generally do Schedulers Cache

not use floating point FP Reg File Integer Reg File

• All newer processors FMul FP Move Load Store ALU ALU


have integrated math FAdd
MMX
FP Store

coprocessors but vary SSE

in performance L1 D-Cache and D-TLB

Pentium 4 Core with Circled Math Coprocessor


© 2008 Lenovo

Math Coprocessors
The math coprocessor, also called the FPU (floating point unit) or coprocessor, has special circuits
that allow it to process numeric floating point calculations in fewer cycles than the main processor.
Floating point multiplies are several times faster than standard integer execution units for numeric
calculations. The math coprocessor is also used when the algorithm requires a large range or a lot
of precision in its results.
Applications must be specifically written to take advantage of the math coprocessor. Typically only
CAD, statistical analysis, vector graphics, and numeric applications utilize the math coprocessor.
Most operating systems, including Windows and Linux, do not utilize a math coprocessor (except
for minor use in the graphics engine).
Intel processors beyond the 486DX have integrated math coprocessors.
Examples of math coprocessors for earlier processors:

• 386SX, 386SLC, and 486 SLC: 387SX and compatible


• 386DX: 387DX and compatible
• 486SX: 487SX, OverDrive (486DX2), and compatible

PC Architecture (TXW102) September 2008 11


Topic 2 - Processor Architecture

Processor Features:
Clock Multipliers

• Processors run internally (core) at


faster speeds than externally
(system bus). Lower System bus
speed 64-bit
• Variety of ratios 400, 533, 800,
1067 MHz

- 1x, 2.5x, 3x, 15x, 19x, 24x 1 MB L2 cache


- 1.8 GHz/100 MHz (18x), 256-bit
Full
2.53 GHz/133 MHz (19x), Higher speed

3.0 GHz/200 MHz (15x) speed


32 KB data cache 32 KB inst cache

• System bus migrated from FPU Load Store Integer Integer

100 MHz to 266 MHz.


- Core 2 Duo is 800 MHz (4 x 200) or
1067 MHz (4 x 233)
• Advantages of lower systemboard Internal L1 and L2
clock speed cache provides
90% hit rate at the
- Engineering design costs greatly reduced faster clock speed
- Systemboard components less costly
- Less frequency emission

© 2008 Lenovo

Clock Multipliers
The clock multiplier is the ratio between the internal clock speed and the external bus speed (the
system bus or front side bus). A processor runs internally at a faster speed than externally.
Anything internal to the physical chip runs at this fastest speed (i.e., the math coprocessor, L1
cache, and L2 cache). The speed to the memory controller is reduced to 100, 133, or 200 MHz.
While running everything at the fastest speed is ideal, the dollar cost of doing this is prohibitive.
Having slower external speeds reduces engineering design costs, makes all the external peripherals
less costly, and reduces frequency emission.
The system bus has migrated from 100 MHz to 133 to 200 to 233 to 266 MHz. The Core 2 Duo
uses an 800 MHz or 1067 MHz system bus, yet this is really still a 200 or 233 MHz bus that is
quad-pumped (four transfers in one cycle).

PC Architecture (TXW102) September 2008 12


Topic 2 - Processor Architecture

Processor Features:
Burst Mode Transfer

• Processors transfer data from processor to L1, L2, L3 cache, and


memory via Burst Mode Transfers.
• Burst Mode Transfer:
- Sends only one address before first Typical burst mode

data transfer; it is followed by four L2 write 3-1-1-1


to eight data transfers. memory write 3-2-2-2
- Increases performance by not L2 read 3-1-1-1
sending multiple sequence addresses.
memory read 5-1-1-1

Address bus Address


Data bus Data Data Data Data

• Each transfer uses cache lines (typically 64 or 128 bytes).


© 2008 Lenovo

Burst Mode Transfer


Processors usually transfer data from the processor buffers or core to L1 cache, L2 cache, L3
cache, and external memory via a cache line. A cache line is the smallest amount of information
that can be transferred between the processor and cache/memory. The larger the cache line, the
more information that is transferred at a time. The cache line size is hard-coded in the processor;
for example, for the Pentium 4 (Prescott) the cache line size is 64 bytes for L1 cache and memory
transfers and 128 bytes for L2 cache transfers.
The data path between the processor buffers or core is less than the size of the cache line, so
multiple transfers must take place to transfer the cache line. For example, the Core 2 Duo has a
256-bit data path with the L2 cache, so a 128 byte (which is 1024 bits) cache line would take four
transfers.
Burst mode transfer means the transfer is done in as little as five cycles by sending the address only
with the first data transfer. This technique makes the processor more efficient.
Example: Address, data, data, data, data (address only sent once) is the written way of showing that
this transfer is "1 + 1 + 1 + 1 + 1 = 5 cycles". It is more commonly referred to as "2, 1, 1, 1," where
the "2" refers to "address and data," and each "1" refers to "data."
Burst mode transfers apply to both reads and writes. Transfers between the processor and L2 cache
are usually 2, 1, 1, 1, but each vendor can implement the timing differently. Transfers between the
processor and memory are not usually 2, 1, 1, 1. Again, each vendor can choose its
implementation.
Different systems have different burst cache cycles due to differences in the processor, cache
controller, bus, and architecture.

PC Architecture (TXW102) September 2008 13


Topic 2 - Processor Architecture

Processor Features:
MMX, SSE, SSE2, SSE3, SSE4 [HD Boost]

• Instruction set additions for Intel processors


• Speeds performance of multimedia- and graphics-related functions
(3D graphics, speech, video, audio)
• Software must write to instructions for processor to utilize
• SSE4 includes instructions for HDTV decompression and playback
• Intel HD Boost is name for SSE4 and Super Shuffle Engine
New
Date Intel Technology
Instructions
1997 Intel MMX 57
1999 Streaming SIMD Extensions 70
2000 Streaming SIMD Extensions 2 (SSE2) 144
2004 Streaming SIMD Extensions 3 (SSE3) 13
2006 Supplemental Streaming SIMD 32
2007 Streaming SIMD Extensions 4 (SSE4) 54

© 2008 Lenovo

MMX
Intel's MMX technology is designed to accelerate key elements of demanding multimedia and
communications applications such as audio, video, 2D and 3D graphics, animation, and speech
recognition. Intel insists MMX does not stand for Multimedia Extensions, so MMX does not
officially stand for anything.
MMX technology was first implemented in second generation Pentium processors in 1997, and
MMX has been implemented in all subsequent Intel processors.
MMX highlights include the following:
• Fifty new instructions
• Eight 64-bit wide MMX registers and four new data types
• Single Instruction, Multiple Data (SIMD) technique
MMX technology includes 57 new instructions, which accelerate calculations common in audio,
2D and 3D graphics, video, speech synthesis and recognition, and data communications algorithms
by as much as 8x.
Intel processors with MMX technology can execute two MMX instructions simultaneously.

PC Architecture (TXW102) September 2008 14


Topic 2 - Processor Architecture

Most MMX instructions follow the pattern of performing a single operation on a series of integer
values. This technique is called Single Instruction, Multiple Data (SIMD). SIMD is ideal for the
algorithms and data types frequently found in multimedia software. Examples include wavelet
compression, MPEG, motion compensation, color space conversion, texture mapping, 2D filtering,
matrix multiplication, fast Fourier transforms, discrete cosine transforms, and phoneme matching.
In general, such routines consist of small, repetitive loops that operate on 8-bit or 16-bit integers. It
is these routines that yield the greatest overall performance increase when converted to MMX-
optimized code.
MMX has no impact on the operating system, so it is compatible with existing x86-based operating
systems. Applications can take advantage of MMX in two ways. Either it can use MMX-enabled
drivers (like a graphics driver), or it can add MMX instructions to critical routines. Most
applications utilize the MMX drivers.

Streaming SIMD Extensions (SSE)


In 1999, Intel introduced Streaming SIMD Extensions (SSE) on the Pentium III (and SSE has been
included in all subsequent Intel processors). Streaming SIMD Extensions consists of 70 new
instructions that applications or drivers can utilize to increase performance. These Streaming Single
Instruction Multiple Data (SIMD) instructions accelerate performance of 3D graphics. The
instructions also speed advanced imaging, MPEG encoding, streaming audio and video, and speech
recognition applications. Of the 70 instructions, 50 are SIMD floating point instructions, 8 are new
instructions focused on streaming data to and from memory more efficiently, and 12 are MMX-
class SIMD integer instructions called New Media Instructions.
Unlike MMX instructions, which are limited to integer-based calculations (integer data types),
Streaming SIMD Extensions include SIMD floating point instructions (as well as additional SIMD
integer and cacheability control instructions). This inclusion is made possible by 8 new dedicated
floating point 128-bit registers, which allows each register to hold four IEEE single-precision
floating point values. The new registers enable concurrent floating point and MMX technology
execution of the following:
• Concurrent SIMD FP and MMX technology execution
• Concurrent scalar FP and MMX technology execution
• Concurrent SIMD FP and scalar FP execution
The 8 new registers are used with both SIMD and scalar floating point instructions.

PC Architecture (TXW102) September 2008 15


Topic 2 - Processor Architecture

Streaming SIMD Extensions 2 (SSE2)


In 2000, Intel introduced Streaming SIMD Extensions 2 (SSE2) on the Pentium 4 (and SSE2 has
been included in all subsequent Intel processors). SSE2 includes 144 new instructions to accelerate
certain types of applications. The first Single Instruction Multiple Data (SIMD) instructions came
with Intel’s 57 MMX instructions in 1997, which are SIMD-Int (integer) instructions. In 1999, Intel
released SIMD-FP (floating point) extensions called Streaming SIMD (SSE). The new 144
instructions in Pentium 4 are an extension to MMX and SSE. SSE2 allows the Pentium 4 to handle
two 64-bit SIMD-Int operations and two double precision 64-bit SIMP-FP operations, which is in
contrast to the two 32-bit operations MMX and SSE handle. Applications that utilize elements such
as video, speech, images, photo processing, and encryption could increase performance greatly by
utilizing SSE3. Applications geared to the financial, engineering, and scientific industries could
also significantly improve performance utilizing SSE3.

Streaming SIMD Extensions 3 (SSE3)


In 2004, Intel introduced Streaming SIMD Extensions 3 (SSE3) on the Pentium 4 with the code-
name Prescott (and SSE3 has been included in all subsequent Intel processors). SSE3 includes 13
new instructions to accelerate certain types of applications. These new instructions enhance the
performance of optimized applications for the digital home such as video, image processing, and
media compression technology. Two of these instructions are related to Hyper-Threading
Technology; these instructions improve Hyper-Threading so that the chip can handle multiple tasks
simultaneously. Software applications must be written specifically to utilize SSE3. Applications
that utilize elements such as video, speech, images, photo processing, and encryption could increase
performance greatly by utilizing SSE3. Applications geared to the financial, engineering, and
scientific industries could also significantly improve performance utilizing SSE3.

Streaming SIMD Extensions 4 (SSE4)


In 2007, SSE4 was introduced as a new instruction set beginning with the 45 nm Intel processors
(code-named Penryn). SSE4 consists for 54 new instructions. SSE4.1 and SSE4.2, a subset
consisting of 7 instructions, will be available in late 2008 with the Intel processor code-named
Nehalem. Unlike all previous versions of SSE, SSE4 contains instructions that do operations not
specific to multimedia applications. SSE4 totally lacks support for operations on 64-bit MMX
registers; SIMD integer operations can be carried out on 128-bit XMM registers only. A new
"Super Shuffle Engine" manages the way information is loaded into the SSE engine. Applications
don't always store data in a way that fits optimally into the parallel vector calculation engines of the
SSE units. Super Shuffle Engine automatically optimizes the way data for SSE units is pulled into
the chip to keep those units fully loaded at all times. This should help developers better make full
use of SSE without concern for odd data format structures.
Intel HD Boost is the name for SSE and the faster Super Shuffle Engine multimedia micro-
architectural features.

PC Architecture (TXW102) September 2008 16


Topic 2 - Processor Architecture

Processor Features:
Hyper-Threading (HT) Technology

• Two logical processors on one physical processor


• Maintains architectural state (AS) for two logical processors
• Hardware support for multithreaded applications
• Up to 30% performance gain with multithreaded OS and applications
• Utilized on some Intel desktop and mobile processors
• Not used on latest Intel multi-core processors
Hyper-Threading

Hyper-Threading Dual-core
AS AS
One physical processor
One physical processor
with two logical cores
with two physical cores
(one physical core)
Processor
execution Parallel threads executed Parallel threads executed
resources on a single core with on separate cores with
shared resources dedicated resources

© 2008 Lenovo

Hyper-Threading (HT) Technology


Hyper-Threading (HT) Technology enables a single physical processor to execute two separate
code streams (called threads) concurrently. Hyper-Threading presents two logical processors to
the operating systems which continuously switches between two code streams, executing the
two threads concurrently and optimizing the use of shared processing resources. While some
execution resources such as caches, execution units, and buses are shared, each logical processor
has its own architecture state with its own set of general-purpose registers and control registers
to provide increased system responsiveness in multitasking environments.
Architecturally, an IA-32 processor with Hyper-Threading Technology consists of two logical
processors, each of which has its own IA-32 architectural state (AS). The architectural state that
is duplicated for each logical processor consists of the IA-32 data registers, segment registers,
control registers, debug registers, and most of the model-specific registers. Each logical
processor also has its own advanced programmable interrupt controller (APIC). This technology
enables thread-level parallelism (TLP) which is different than the instruction-level parallelism
(ILP) used in superscalar processors.

PC Architecture (TXW102) September 2008 17


Topic 2 - Processor Architecture

IA-32 Processor with Traditional Dual Processor


Hyper-Threading Technology (DP) System

Logical AS AS AS AS
Processor

Processor Core Processor Core Processor Core

IA-32 IA-32
The physical processor Processor Each processor is a
Processor
consists of two logical separate physical
processors that share a processor
single processor core

System Bus System Bus


AS – IA-32 Architectural State

Comparison of an IA-32 Processor with Hyper-Threading Technology and a


Traditional Dual Processor System

Hyper-Threading Technology leverages the process-level and thread-level parallelism found in


contemporary operating systems and high-performance applications by implementing two logical
processors on a single chip. This configuration allows a thread to be executed on each logical
processor. This can be two different applications or two threads from the same application.
Instructions from both threads are simultaneously dispatched for execution by the processor core.
The processor core executes these two threads concurrently, using out-of-order instruction
scheduling to keep as many of its execution units as possible busy during each clock cycle.
An IA-32 processor with Hyper-Threading technology will appear to software as two independent
IA-32 processors, similar to two physical processors in a traditional DP platform. This
configuration allows operating system and application software that are already designed to run on
a traditional DP or MP system to run unmodified on a platform that uses one or more IA-32
processors with Hyper-Threading technology. Here, the multiple threads that would be dispatched
to two or more physical processors are now dispatched to the logical processors in one or more IA-
32 processors with Hyper-Threading technology.
A processor with Hyper-Threading technology can provide a performance gain of up to 30% when
executing multi-threaded operating system and application code over that of a comparable IA-32
processor without Hyper-Threading technology.
The following performance gains are likely:
• Two physical processors: 15-25% performance gain
• Four physical processors: 1-13% performance gain
• Eight physical processors: 0-5% performance gain

PC Architecture (TXW102) September 2008 18


Topic 2 - Processor Architecture

Performance with and without Hyper-Threading

Hyper-Threading delta

Physical processor

1.0

One-way Two-way

Although existing operating system and application code will run correctly on a processor with
Hyper-Threading technology, developers must rewrite code for multithreading support to get the
optimum benefit from Hyper-Threading technology. Microsoft Windows XP, Windows Vista, and
Linux are already multithreading operating systems. Device drivers can also be written to utilize
multithreading. For full support, the BIOS, OS, and chipset must support Hyper-Threading.
If the user enables Hyper-Threading technology in BIOS, Windows Server will detect and use
logical processors, which are not counted against the Windows licensing limit. Windows will first
count physical processors, and, if the license permits more processors, then logical processors will
be counted.

Without Thread 2: Thread 1:


Get Network data Process data
Hyper-
Threading

Logical processors Physical processor


Physical
visible to OS resource allocation
processors

Thread 1:
Process data Time
With
Hyper- saved
Thread 2:
Threading Get network data

How Hyper-Threading Technology and Gigabit Ethernet Work

PC Architecture (TXW102) September 2008 19


Topic 2 - Processor Architecture

Processor Features:
Intel 64 Technology

• Instruction set extensions to IA-32 to Benefits:


allow Intel processors to run 64-bit • 32-bit x86 software
operating systems and applications compatibility

• Runs both 32-bit or 64-bit operating • Runs legacy software on


next generation hardware
systems
• Larger memory
• Formerly called EM64T addressability
• Better performance
IA-32 processors Intel 64 processors
(older desktop and (newer desktop and Itanium processors
mobile processors) mobile processors)
Today’s 32-bit Poor
Runs Runs
software (runs in emulation)
New 64-bit
No Runs No
software (Intel 64)
64-bit Itanium
No No Runs
software (IA-64)

© 2008 Lenovo

Intel 64 Technology
Intel 64 Technology (Intel 64) is an enhancement to Intel’s IA-32 (Intel Architecture 32-bit or
x86) architecture that allows the processor to run 64-bit code that has been compiled for the
Intel 64 architecture. The enhancement allows the processor to run newly written 64-bit code
and access larger amounts of memory. Intel 64 used the code names Yamhill and Clackamas
Technology (CT). Intel 64 Technology is an extension of the 32-bit x86 or IA-32 instruction set.
Intel 64 Technology was originally named Extended Memory 64 Technology (EM64T), but
was renamed in July 2006 with the announcement of the Intel Core micro-architecture.
Systems utilizing the Intel 64 instruction set extensions allow applications to run the existing
base of 32-bit applications or new 64-bit applications which permits migration to 64-bit
applications to take place within any future timeframe.
These extensions do not run code written for the Intel Itanium processor family known as IA-
64. The Itanium processor family is based on the Explicitly Parallel Instruction Computing
(EPIC) architecture. Itanium processors can run 32-bit software in the IA-32 Execution Layer
(IA-32 EL), but Intel has not stated that this layer will support Intel 64 code; this IA-32 EL does
not run 32-bit x86 applications natively, but translates the 32-bit instructions into code the
Itanium can run. Operating systems required for IA-64 will not run on Intel 64 processors (and
vice versa).

PC Architecture (TXW102) September 2008 20


Topic 2 - Processor Architecture

64-Bit Computing
The terms 64-bit and 32-bit refer to the number of bits that each of a processor's general-purpose
registers (GPRs) can hold. The phrase 64-bit processor depicts a processor whose register width is
64 bits. A 64-bit instruction is an instruction that operates on 64-bit data. When applied to a
processor, the bits characterize the processor’s data stream; so, a 64-bit processor has 64-bit register
widths and can perform operations on 64 bits of data at a time.

AMD64 Comparison
Intel 64 Technology is compatible with and nearly identical to AMD64 found in the various AMD
processors (with the exception of a few instructions such as 3DNow!). Even though the hardware
microarchitecture in Intel and AMD processors is different, the operating system and applications
ported to one processor will likely run on the other processor. Both Intel 64 and AMD64 are based
on AMD’s x86-64 instruction set extensions.

More Registers
Intel's IA-32 processors with Intel 64 Technology have 16 General Purpose Registers (GPRs) with
64-bit width and 16 XMM registers with 128-bit width. The additional registers are only used by
applications running in 64-bit mode. IA-32 processors without Intel 64 have 8 GPRs with 32-bit
width and 8 XMM registers with 128-bit width. A processor with more registers does not have to
retrieve data from cache as often, so it has better performance.

Advantages
Intel 64 Technology allows Intel processors to directly access in excess of 4 GB of memory. With
64-bit memory address limits, the theoretical memory size limit is 16 billion gigabytes (16 exabytes
[EB]), but systems will not implement the full 64-bit addressability. IA-32 processors can only
address a maximum of 4 GB which is a major limitation for large databases and digital content
creation solutions. For example, the Xeon processor with Intel 64 supports 48 bits of virtual
memory and up to 40 bits of physical memory; this allows access to 256 TB of virtual memory and
1 TB of physical memory.
When a 32-bit application is run on an Intel 64 processor, software still limits memory access to 4
GB and performance might not improve. Recompilation will be required to take advantage of
memory in excess of 4 GB. An application recompiled to run in a 64-bit environment will be able
to access significantly more memory, and performance might improve if the application was
previously memory constrained.
Another advantage of 64-bit processors is the ability to handle larger floating-point numbers that
are used in scientific and engineering calculations. While 32-bit processors can only handle
floating-point calculations with values up to 232 (about 4 billion) unless they resort to software
emulation, 64-bit processors can directly use up to 264 (about 18 billion billion [exabytes]).

PC Architecture (TXW102) September 2008 21


Topic 2 - Processor Architecture

Processor Features:
Intel 64 Technology Support

• Intel 64 Technology requires a computer with a processor, chipset, BIOS,


OS, device drivers, and applications enabled for Intel 64 Technology
• Supported by Microsoft Windows XP Professional x64 Edition and
Windows Vista (64-bit versions)

Operating System Device Drivers Applications


Legacy mode Existing 32-bit 32-bit 32-bit
Compatibility mode New 64-bit 64-bit 32-bit
64-bit mode New 64-bit 64-bit New 64-bit; 32-bit

• Intel 64 Technology is found in almost all current desktop


and mobile processors

© 2008 Lenovo

Intel 64 Technology Support


Intel 64 Technology requires a computer with a processor, chipset, BIOS, OS, device drivers,
and applications enabled for Intel 64 Technology.
The three scenarios for Intel 64 Technology are the following:
Legacy Mode – 32-bit OS and 32-bit applications: No software changes are required; however,
the user gets no benefit from Intel 64 Technology.
Compatibility Mode – 64-bit OS and 32-bit applications: This usage requires all 64-bit device
drivers. In this mode, the OS will see the 64-bit extensions, but the 32-bit application will not.
Existing 16-bit and 32-bit applications do not need to be recompiled, and may or may not
benefit from the 64-bit extensions. Each 32-bit application gets its own 4 GB memory space
whereas Legacy Mode has all applications share one 4 GB space. The application will likely
need to be recertified by the vendor to run on the new 64-bit extended OS. This mode is also
called IA-32e.
64-bit Mode – 64-bit OS and 64-bit applications: This usage requires 64-bit device drivers. It
also requires applications to be modified for 64-bit operation and then recompiled and validated.
Both 32-bit and 64-bit applications can run simultaneously in this mode. This mode is also
called IA-32e.
Note:
• A 32-bit OS (such as Windows XP Professional) will run on both IA-32 processors and IA-32
processors with Intel 64 Technology support.
• Current 32-bit applications will run without being recompiled on a 32-bit OS, even with an
Intel 64 Technology processor.

PC Architecture (TXW102) September 2008 22


Topic 2 - Processor Architecture

IA-32e Mode
Mode Legacy Mode
Compatibility 64-bit
Default Sizes Address 32/16 32/16 64
Operand 32/16 32/16 32
General-purpose register width 32 32 64
Operating system required 32/16 64 64
Application recompile required No No Yes

Intel 64 Technology Modes

Operating System Support


Following are some popular operating systems supporting Intel 64 Technology:
• Microsoft Windows XP Professional x64 Edition
• Microsoft Windows Server 2003 x64 Edition
• Microsoft Windows Vista 64-bit versions
• Red Hat Enterprise Linux 3
• SuSE Linux Enterprise Server 9
• Mandrakesoft Linux 10.1
Windows XP Professional x64 Edition was released in May 2005. Windows XP Professional x64
Edition supports 128 GB of RAM and 16 terabytes of virtual memory address space, as compared
to 4 GB of both physical RAM and virtual memory address space for 32-bit Windows XP
Professional. Windows XP Professional x64 Edition runs 32-bit applications in the Windows on
Windows 64 (WOW64) subsystem, providing compatibility with the more than 10,000 existing 32-
bit Windows applications while enabling new 64-bit applications. Applications running in the
WOW64 system on Windows XP Professional x64 Edition each have a full 4 GB of virtual
memory space. Applications compiled to take advantage of the /3 GB switch will actually get 4
GB, without constraining the operating system at all, since it is running in the 8 terabytes of virtual
address space that Windows XP Professional x64 Edition has for the system processes. With
Windows XP Professional x64 Edition, you can run both 64-bit and 32-bit applications side by
side. Your existing 32-bit applications run in WOW64, while the 64-bit applications run natively.

Hardware Support
To run 64-bit applications, the system will need a 64-bit OS, IA-32 processor with Intel 64
Technology, updated BIOS, and updated drivers that support Intel 64 Technology.

Processors and Products


Intel 64 Technology first appeared in 2004 in select Intel Xeon processors for dual-processing
servers and workstations (code-named Nocona) and in Intel Xeon MP for multi-processing (code-
named Potomac) servers. Intel 64 Technology first appeared in select Intel desktop processors
starting in 2005.

PC Architecture (TXW102) September 2008 23


Topic 2 - Processor Architecture

Processor Features:
Execute Disable Bit

• Execute Disable Bit is a feature in Intel processors that helps


protect memory data areas from malicious software execution.
• It was introduced gradually in desktop and mobile processors
after mid-2004.
• Recent operating systems, BIOS, and chipsets support
this feature.

Execute Disable Bit capability


on newer Intel processors
© 2008 Lenovo

Execute Disable Bit


Execute Disable Bit capability is a feature in Intel processors that helps to protect memory data
areas from malicious software execution. Malicious buffer overflow attacks pose a significant
security threat to businesses, increasing IT resource demands, and in some cases destroying
digital assets. In a typical attack, a malicious worm creates a flood of code that overwhelms the
processor, allowing the worm to propagate itself to the network and to other computers.
Intel’s Execute Disable Bit functionality can prevent certain classes of malicious buffer overflow
attacks when combined with a supporting operating system. Execute Disable Bit allows the
processor to classify areas in memory where application code can execute and where it cannot.
When a malicious worm attempts to insert or execute code in memory areas classified as
protected, the processor disables code execution, preventing damage or worm propagation.
To provide end-to-end no execute (NX) coverage, Intel offered Execute Disable Bit for
workstations and other server products in late Q3 2004. Desktop and mobile processors started
to incorporate this feature in late 2004.
The BIOS of the system must also support Execute Disable Bit. Often the setup utility of the
system will support enabling or disabling this feature.
All recent operating systems and Intel chipsets support Execute Disable Bit.

PC Architecture (TXW102) September 2008 24


Topic 2 - Processor Architecture

Processor Features:
Dual-core

• Two independent processor Multi-threaded:


cores on single die • Windows XP and Vista
- Two processors in one package • About 200 content creation
• Performance boosted by applications
increasing parallelism with Single-core Dual-core
reduced clock speed
• Requires multi-threaded
OSes and applications to
utilize two cores
• Common in many
Intel processors State State State State
Execution Execution Execution

L1/L2 cache L1/L2 cache

Bus Bus Bus

1 core 2 independent cores


1 physical die 1 physical die
© 2008 Lenovo

Dual-core
In 2005, Intel introduced processors utilizing dual-cores which is two independent cores within
a single physical processor die. The first dual-core processors were the Intel Pentium D
Processor 8xx and Intel Pentium Extreme Edition 8xx. With previous single-core processors,
the traditional way to increase performance was to increase the clock speed. However,
increasing clock speeds has its limits due to power consumption and thermal problems. Dual-
core technology allows a duplication of execution resources to provide increased system
responsiveness in multi-tasking environments and headroom for next-generation multi-threaded
applications and new usages.
The move to multiple processor cores is the logical solution to the speed barrier single-core
processor designs have encountered. Adding multiple cores can increase processor power
without increasing clock speeds. Having multiple processor cores has the added benefit of
improving PC performance in situations where users need to run multiple applications at the
same time, for example, when watching a DVD movie while encoding an MP3 file. Each
application can have its own dedicated processor core rather than sharing a single core, which
can hinder the performance of one or both applications.
A dual-core processor provides multi-tasking benefits because it is like having an extra engine
in your car. It also provides multi-user benefits because on a home network one person can use
the PC while another person accesses and interacts with stored content from the same PC even
from other rooms. For example, your son plays a game on a PC in his bedroom. Using a remote
control, your daughter accesses the same PC to listen to music, even though she is in the family
room. A digital media adapter enables her to send the music to a stereo so she can listen to it.

PC Architecture (TXW102) September 2008 25


Topic 2 - Processor Architecture

A dual-core processor plugs directly in a single socket on a systemboard.


Intel dual-core processors are based on symmetrical multiprocessing (SMP) which is a parallel
architecture where all processors run a single copy of the operating system, share the memory and other
resources of one computer, and have equal access to memory, I/O, and external interrupts.
Customers who use dual-core processors from Intel or AMD do not need to buy extra licenses for
Microsoft software. Microsoft counts the sockets, not the number of cores, in Windows OS licensing.

Software
Software (operating systems and applications) needs to be written to run multiple threads
simultaneously to take advantage of additional processor cores, i.e., they must be multi-threaded. Multi-
threaded means the application can run multiple bits of code simultaneously. An example of a multi-
threaded operation is when a Word document is opened, antivirus code can automatically scan the file
in a separate thread. Another example is having a thread record a TV show in the background. Modern
operating systems can handle multiple threads of program execution at the same time either from a
single multi-threaded application or from multiple, single-threaded applications. Microsoft Windows
XP and Vista, as well as over 200 applications such as Adobe Photoshop CS and Roxio VideoWave
7.0, are multi-threaded. Most of today's multi-threaded applications focus on content creation and
multimedia production, which tend to perform many operations in parallel. As dual-core technology
becomes more prevalent, you can expect to see more multi-threaded applications become commonplace
such as 3D-intensive games that can take advantage of dual-core technology by using more robust
physics and artificial intelligence engines for more realistic effects and gameplay.
The operating system sees each of the execution cores as a discrete logical processor that is
independently controllable.
If software is already written to take advantage of Hyper-Threading technology, only a few adjustments
are needed for the software to utilize dual cores.

Block diagram of dual-core Intel processor


(Pentium D Processor 8xx)

PC Architecture (TXW102) September 2008 26


Topic 2 - Processor Architecture

Processor Features:
Dual-core and Hyper-Threading
• Earlier dual-core processors use Hyper-Threading Technology
- Pentium D Processor 9xx
- Pentium Processor Extreme Edition 8xx and 9xx
- Can process four threads
• The latest dual-core and quad-core processors do not use Hyper-Threading Technology
- Core 2 Duo
- Core 2 Quad
Type Logical View Processing Example
Each thread processed Celeron M 3xx
1
No Hyper-Threading (HT) serially through single Celeron D 3xx
execution core Pentium M 7xx
1 Two threads processed Mobile Pentium 4 5xx
Hyper-Threading (HT) through two logical cores in
2
increasingly parallel manner Pentium 4 5xx and 6xx

1 Two threads processed in Core 2 Duo


Dual-core (No HT) separate execution cores in
true parallel manner Core 2 Quad
2

1 Four threads processed (two


Dual-core (with HT) 2 3 logical cores in each Extreme Edition 8xx/9xx
4 execution core)

© 2008 Lenovo

Dual-core and Hyper-Threading Technology


Dual-core technology places two independent execution cores into the same processor die. This
is different from Intel's Hyper-Threading Technology which uses a single (physical) execution
unit but allows the processor to run two separate (logical) execution threads. The earliest Intel
dual-core processor used Hyper-Threading (such as the Intel Pentium Processor Extreme
Edition), which supported four independent threads (two of the threads are running on each
physical execution core within the two logical units).
The latest Intel multi-core processors do not use Hyper-Threading Technology.

PC Architecture (TXW102) September 2008 27


Topic 2 - Processor Architecture

Processor Features:
Dual-core Features

Earlier Intel dual-core processors The latest Intel processors use shared
use independent L1 and L2 caches L2 cache (Intel Smart Cache)
• Pentium D 8xx and 9xx • Core 2 Duo Processor
• Core 2 Quad Processor

Execution Execution Execution Execution


Core Core Core Core

L1 Cache L1 Cache L1 Cache L1 Cache


L2 Cache L2 Cache L2 Cache Control
L2 Cache

Memory Controller Hub Memory Controller Hub

Dual-core with Independent L2 Cache Dual-core with Shared L2 Cache

© 2008 Lenovo

Dual-core Features
Dual-core processors differ from single-core processors by providing two independent
execution cores. While some execution resources are shared, each logical processor has its own
architecture state with its own set of general-purpose registers and control registers to provide
increased system responsiveness. Each core runs at the same clock speed.

Single-core with Dual-core Dual-core


Feature HT Technology (Prescott with Independent with Shared
Processor) L2 Cache L2 Cache
Physical Packages 1 1 1
Logical Processors 2 2 2
Execution Cores 1 2 2
Shared L1 Cache Yes No No
Shared L2 Cache Yes No Yes
Execute Disable Bit Yes Yes Yes
Virtualization Tech No Some Usually

PC Architecture (TXW102) September 2008 28


Topic 2 - Processor Architecture

Shared L2 Cache
The Intel Core Duo Processor features a shared 2 MB L2 cache with Advanced Transfer Cache
Architecture and system bus between the two execution cores called Intel Smart Cache. The Intel
Core 2 Duo Processor features a shared 2 MB or 4 MB L2 cache that uses the name Intel Advanced
Smart Cache. These shared L2 caches enable the active execution core to access the full 2 MB
cache when one other execution core is idle. Dynamic cache allocation across both cores enhances
performance and reduces cache under-utilization and misses. Shared L2 cache enables single copy
of data to be used by each execution core. Shared cache designs may directly transfer data between
L1 caches, or between L2 and L1 caches.

Dual-core Performance
Multi-threaded applications or running multiple applications simultaneously could provide up to
70% performance improvement on dual-core processors.
For single-threaded, single-focus tasks, using a faster clocked single-core processor provides better
performance than a slower clocked dual-core processor. But a user who runs multiple applications
at once with several simultaneous operations will immediately see benefits from any dual-core
processor.
Because Windows XP itself is multithreaded, multi-threaded applications do not have to be running
to achieve a performance gain. Windows is a multitasking environment, so there are usually
applications running in both the foreground (such as an Internet browser) and the background (such
as virus scanning). A dual-core processor should execute the multiple threads of these applications
more efficiently.
Dual-core processors perform better than single-core chips even at slower clock speeds because
they reduce power consumption and heat dissipation.
See www.intel.com/technology/computing/dual-core for more information.

PC Architecture (TXW102) September 2008 29


Topic 2 - Processor Architecture

Processor Features:
Quad-core Features

• Some Intel quad-core processors


use two shared L2 caches
- Intel Core 2 Quad Processor for
notebooks and desktops

Execution
Execution

Execution

Execution

Core
Core

Core
Core

L1 L1 L1 L1
Cache Cache Cache Cache
L2 Cache L2 Cache

Memory Controller Hub

Quad-core with two shared L2 caches

© 2008 Lenovo

Quad-core Features
Quad-core processors differ from single-core and dual-core processors by providing four
independent execution cores. While some execution resources are shared, each logical processor
has its own architecture state with its own set of general-purpose registers and control registers
to provide increased system responsiveness. Each core runs at the same clock speed.
The Intel Core 2 Quad Processor (codenamed Kentsfield announced January 2007) features a
total of 8MB of L2 cache. This processor is really a dual-core on a dual-die; each dual-core has
a shared 4 MB L2 cache. So with two 4 MB L2 caches, the processor has 8 MB L2 cache total.

Dual-core Quad-core Quad-core


Feature with shared with two independent with shared
L2 Cache L2 Cache L2 Cache
Physical packages/sockets 1 1 1 (planned)
Dies 1 2 1 (planned)
Logical processors 2 4 4 (planned)
Execution cores 2 4 4 (planned)
Shared L1 cache No No No (planned)
Shared L2 cache Yes Shared by two cores Yes (planned)

PC Architecture (TXW102) September 2008 30


Topic 2 - Processor Architecture

Processor Features:
Virtualization Technology

• Allows a system to run multiple operating systems in independent partitions


• Hardware-assisted virtualization in latest Intel desktop and notebook processors
• Hardware-assisted virtualization is more robust than software-based virtualization

Virtual Machine (VM) Concept


The VMM:
Applications Applications Applications
• Arbitrates guest OS
access to physical Applications Applications Applications
resources
• Honors existing Applications Applications Applications
hardware interfaces
to processor, OS OS OS
memory, storage,
graphics, network CPU, Memory, CPU, Memory, CPU, Memory,
adaptors, etc. I/O, Disk (Virtual) I/O, Disk (Virtual) I/O, Disk (Virtual)
A new layer of
system software Virtual Machine Monitor (VMM) or Hypervisor
Platform Hardware (Physical) Processor, Memory, I/O, Disk

© 2008 Lenovo

Virtualization Technology
Virtualization refers to a single system running multiple operating systems and applications in
independent partitions. With virtualization, one computer system can function as multiple "virtual"
systems.
Servers are the main systems using virtualization, so this implementation is called server
virtualization. Server virtualization allows a single physical server to support multiple operating
systems which support many applications and users. Desktops and notebooks are only now starting
to implement virtualization under the term desktop virtualization.
By building virtualization hooks into its processors, Intel (along with AMD with its virtualization
technology) gives third-party developers direct access to the primary operational layer of the
processor. Having support in a processor for virtualization is also called processor-assisted
virtualization, hardware-assisted virtualization, or embedded virtualization.
Intel Virtualization Technology (code-named Vanderpool) improves the robustness and
performance of software-only solutions (such as using VMware on processors without hardware-
assisted virtualization). The Intel Virtualization Technology Specification was released in late
2005.
AMD released a technical specification for virtualization in early 2006.

PC Architecture (TXW102) September 2008 31


Topic 2 - Processor Architecture

Virtualization abstracts software from the underlying hardware infrastructure. In effect, it cuts the
link that ties a specific software stack to a particular system. This enables more flexible control of
both hardware and software resources, which can deliver value across a wide range of IT
requirements. It provides a spare box on which you can quickly throw a test OS to check out an
application or problem without messing up your primary workstation.
Virtualization eliminates the need for performance-sapping workarounds that make software-only
solutions behave much like emulation; the technology eliminates the need to emulate an x86 to
virtualize it.
For more information about Intel Virtualization Technology, visit the Intel Web site at
intel.com/technology/virtualization.

How Virtualization Technology Works


Virtualization means abstracting a computer's physical resources into virtual ones with the help of
specialized software. Abstraction layers allow for the creation of multiple VMs on a single physical
machine. Each VM can run its own OS. If setup is done correctly at the abstraction layer, the OS
running inside a VM works just as if it were running on the base hardware. The "host" OS is installed
on the base system, with virtualization achieved by software running atop the host OS. "Guest" OSs
run under the virtualization software in their own private VMs. The guest OS must go through the
virtualization layer to access physical machine resources.
The key component in building this abstraction layer is commonly referred to as a Virtual Machine
Monitor (VMM) or a hypervisor. This software is responsible for sharing the computer's physical
resources among the many VMs that could be running. The VMM is not an easy piece of software to
get right because it must trick the guest OS into thinking it has control of the real hardware. To
accomplish this, the VMM runs at processor privilege level Ring 0. The guest OS runs up a level, at
Ring 1. Most modern OSs run user applications at Ring 3, where applications are prevented from
trampling on or otherwise adversely affecting one another. Running the OS in Ring 1 lets the VMM
trap some of the operations the guest OS is attempting (like accessing memory) and take corrective
measures.
Another component to creating a VM is abstracting the hardware layer. The VM software must
create virtual hardware devices, such as the disk and network devices, to be consumed by the guest
OS. Each vendor has specific devices its products will emulate. The software then translates these
emulated devices to a device that is present on the physical hardware. By creating these virtual
hardware devices, a guest OS can be copied into another computer running different hardware and
still work. The VMM is responsible for redirecting virtual devices to physical devices.

PC Architecture (TXW102) September 2008 32


Topic 2 - Processor Architecture

Server Virtualization Configurations

App 3 App 1 App 3 App 5


App 1 App 2 App 3
App 4 App 2 App 4 App 6
App 1 App 2
Guest OS
Virtual OS 1 OS 2 OS 3
return layer
OS
OS Hypervisor Software

Hardware Hardware Hardware

Standard configuration, Hosted virtual Hardware-level or hypervisor


no virtual machines machines virtual machines

The applications run The virtual machine runs With the hypervisor
directly on the OS, as an application on the layer between the
which runs directly on host OS. There is an hardware and OSs,
the hardware. intermediary layer of each OS thinks it
software between the is running in the
host OS and the guest standard configuration,
OS. but in fact it is sharing
Examples are Microsoft the resources of the
Virtual PC and VMware base hardware.
GSX Server. Examples are VMware
ESX Server and Xen 3.0

Companies/
Type Description Pros Cons
Products
Hosted Virtualization Machine Manager • VMware Server • Simple installation and • System overhead
(VMM) is installed on top of a (GSX) management • Limited flexibility
host OS • Microsoft Virtual • Failure in the host OS will
Server 2005 impact all virtual
machines running on that
server

Hypervisor A hypervisor is installed directly • VMWare ESX • Higher performance • Lower performance than
on the hardware (often referred Server compared to hosted paravirtualizaton as the
to as "bare metal" virtualization) virtualization as the guest OSs are not made
versus on top of the operating hypervisor interacts "virtualization aware"
system in the hosted architecture directly with the
hardware
• More scalable and
flexible than hosted
virtualization

Paravirtualization Similar to hypervisor-based • Xen based • Improved performance • Requires hardware


virtualization but achieves vendors (Xen • Less overhead virtualization assistance
virtualization by modifying the Source, Virtual from CPU companies
guest operating system, Iron, Red Hat) • Guest operating system
improving performance by • Microsoft Hyper-V modification required
making the guest OS
"virtualization aware"

Source: Merrill Lynch

PC Architecture (TXW102) September 2008 33


Topic 2 - Processor Architecture

Virtualization Configurations

No Hypervisor virtualization Application virtualization Windows Virtual BEA LiquidVM


virtualization (VMware, Xen, Hyper-V) (Solaris containers, Server
BSD jails, app streaming)

Applications Applications Applications Applications Applications Applications Applications Applications Applications

Guest
Operating Operating Operating Operating OS
System System System System

Hypervisor Operating Hypervisor


System

Hardware Hardware Hardware Hardware Hardware

Virtualization usually involves a hypervisor between the operating system and


hardware, but Microsoft, BEA, and Sun have other implementations.

PC Architecture (TXW102) September 2008 34


Topic 2 - Processor Architecture

Processor Features:
Virtualization Technology Processors and Software

• Most new Intel processors with hardware-assisted virtualization


- Some Intel Core 2 Solo, Core 2 Duo, Core 2 Quad,
Core 2 Extreme Processors
• Software called Virtual Machine Monitor still required for virtualization
• Software products:
- Server virtualization
- Desktop virtualization
• Requirements for plenty of memory
(at least 2 GB) to support virtualization
• Provides improved security, management,
and convenience features

VMware Workstation screen shot

© 2008 Lenovo

Virtualization Technology Processors and Software


The first Intel processor with hardware-assisted virtualization support was the desktop-based Intel
Pentium 4 Processor 662 and 672 that announced in November 2005. Most current and future
higher-end processors from Intel for both desktop and notebooks support virtualization. Intel calls
this feature Intel Virtualization Technology (VT) on their processors. Intel Virtualization
Technology requires a computer system with a processor, chipset, BIOS, virtual machine monitor
(VMM), and for some uses, certain platform software, enabled for Intel Virtualization Technology.
Although virtualization is supported on some processors, software (called a Virtual Machine
Monitor [VMM]) is still needed to allow virtualization to work. There are various server and
desktop virtualization products. The three main virtualization options for servers are VMware
ESX/ESXi Server, Microsoft Hyper-V, and the open source Xen hypervisor. VMware allows either
Windows or Linux as the host OS.

PC Architecture (TXW102) September 2008 35


Topic 2 - Processor Architecture

Other than vendors such as VMware and Microsoft, there is an open-source approach to
virtualization from the Xen project. Xen uses paravirtualization which means that the OSes running
on the hypervisor need to be modified to make them run simultaneously. The paravirtualization
method uses a thin layer between the hardware and the OS, with an I/O virtualization scheme that
employs a single set of drivers used by all guest OSes. Xen claims that paravirtualization improves
efficiency and speed. Its Xen 3.0 can create Windows, Linux, and Solaris virtualized servers. See
www.xensource.com for more information.
VMware uses transparent virtualization which means the OSes that run on the hypervisor are not
modified.
Over the next few years, Microsoft intends to phase out separate virtualization products and put
virtualization functionality in the Windows OS. In late 2006, Microsoft released Virtual PC 2007
(replacing Virtual PC 2004), adding support for Windows Vista as a host operating system.

VMware Workstation running Windows XP on Linux

PC Architecture (TXW102) September 2008 36


Topic 2 - Processor Architecture

Intel Virtualization Technology Example


Below is an example of a system using both Intel Virtualization Technology and Intel Active
Management Technology to improve manageability. The system uses two partitions. One partition is
for the user while another partition is for manageability.

Management agents protected from users


User Partition IT Partition • Firewall/IP packet inspection
• Asset management
Host OS • Provisioning/re-provisioning
IT Services • Recovery/patch
Manageability • Failure prediction
Agents
Client management from the console
VT-Enabled Lightweight VMM • Alerting
• Remote booting
Intel Active
• Non-volatile asset inventory
Intel VT Platform Management
Technology (AMT) • Remote control and diagnostics

Virtualization Technology for IA-32 (Intel VT-x)


Virtualization Technology for IA-32 (VT-x) is Intel's technology for virtualization on the IA-32
platform. Intel plans to add Extended Page Tables (EPT), a technology for page table virtualization,
in the upcoming Nehalem architecture.

Virtualization Technology for IA-64 (Intel VT-i)


Virtualization Technology for IA-64 (VT-i), previously codenamed "Silvervale", is Intel's technology
for virtualization on the IA-64 (Itanium) platform.

Intel Virtualization Technology for Directed I/O (Intel VT-d)


In 2006, Intel announced Intel Virtualization Technology for Directed I/O (Intel VT-d). This
technology enables guest virtual machines to directly use peripheral devices primarily through DMA
and interrupt remapping. The specification complements and enables standardization efforts in the
PCI Special Interest Group to enable I/O virtualization capabilities in PCI Express I/O devices.

PC Architecture (TXW102) September 2008 37


Topic 2 - Processor Architecture

Processor Features:
Intel Dynamic Acceleration
• Boosts performance of one core when the other core is inactive
• Used with single-threaded or multi-threaded applications when
extended serial code is executed
• Maintains reasonable thermal dissipation
• Introduced in notebook-based Intel Core 2 Duo processors with
Socket P

Intel Dynamic Acceleration

© 2008 Lenovo

Intel Dynamic Acceleration


Intel Dynamic Acceleration (IDA) is a feature in some Intel processors that boosts the
performance of one of the cores when the other core is inactive. For example, a dual-core
processor at 2.2 GHz could boost a single core to 2.4 GHz while the other core is inactive. The
thermal dissipation is reasonable even with a single core running at a higher speed because the
inactive core is generating less heat than when it is active at its full speed.
IDA happens during execution of single-threaded applications or multi-threaded applications
with extended serial code. Here are some characteristics of IDA:
- One core is idle while the other is active
- IDA goes active when only one string of code needs to be executed
- Multiple applications can be open and sitting in “idle” state
- Active background applications common in multi-tasking usage models may preclude IDA
activation (IT updates, security scans, OS activities, compiling, backup, etc.)
IDA was first introduced in the notebook-based Intel Core 2 Duo processors with Socket P that
originally announced May 2007.

PC Architecture (TXW102) September 2008 38


Topic 2 - Processor Architecture

Processor Features:
Intel Trusted Execution Technology (TXT)
• Enables more secure PC platform
• Hardware extensions to select Intel 3rd Party Software
processors and chipsets including Virtual Machine
Processor Monitor
• Protects against software-based requires
VT and TXT
attacks
Chipset has TXT
• Protects data in virtualized support, VT-d support,
and TPM 1.2 interface
environments Processor
• Requires many hardware,
software, and BIOS
Chipset
components Memory
Graphics
lash
• Introduced in 2007 in AC module and BIO
S/F
select Intel Core 2 platform 3rd Party
initialization Intel Hardware
Duo E6x50 processors Software (TPM v1.2)

SINIT AutoCompute (AC) module


BIOS AC module

Input Devices

Intel Trusted Execution Technology Ingredients


© 2008 Lenovo

Intel Trusted Execution Technology (TXT)


Intel Trusted Execution Technology, formerly code-named LaGrande, is a highly versatile set of
hardware extensions to Intel processors and chipsets that, with appropriate software, enhance the
platform security capabilities. Intel Trusted Execution Technology will provide a hardware-based
security foundation that will help enable greater levels of protection for information stored,
processed, and exchanged on the PC.
Trusted Execution Technology lets developers program features into the chipset to protect
applications if malicious code invades a PC. Features include booting software into a known,
trusted state when an application is installed, preventing compromised software from being
launched. It also provides assigned memory partitions, so an application can be launched in a
partition inaccessible to other software or hardware. A third feature prevents access to data in
memory, a processor cache, or elsewhere when software closes or crashes.
Designed to help protect against software-based attacks, Intel Trusted Execution Technology
integrates new security features and capabilities into the processor, chipset, and other platform
components. The hardware-rooted security enables the ability to increase the confidentiality and
integrity of sensitive information from software-based attacks, protect sensitive information without
compromising the usability of the platform, and deliver increased security in platform-level
solutions through measurement and protection capabilities.

PC Architecture (TXW102) September 2008 39


Topic 2 - Processor Architecture

Intel Trusted Execution Technology requires a computer system with Intel Virtualization
Technology, an Intel TXT-enabled processor, chipset, BIOS, Authenticated Code Modules, and an
Intel TXT-compatible measured launched environment (MLE). The MLE could consist of a virtual
machine monitor, an OS, or an application. In addition, Intel TXT requires the system to contain a
TPM v1.2, as defined by the Trusted Computing Group and specific software for some users. Local
laws and regulations may limit Intel TXT's available in certain countries. For more information, see
www.intel.com/technology/security.

VM0 Protected VM1 Open VM2 Standard

Apps Apps Apps

Service OS Guest OS Guest OS


Guest OS

Trusted Execution and Virtualization Technology


Enabled Virtual Machine Monitor

Host Hardware
Memory Processor & CS I/O Devices TPM 1.2

Trusted Execution Protection Model Example

PC Architecture (TXW102) September 2008 40


Topic 2 - Processor Architecture

Processor Features:
Intel Core Micro-architecture

• Processor micro-architecture for latest Intel notebook, desktop,


and server processors (after Summer 2006)
• Replacement for NetBurst micro-architecture
• Features:
- Intel Wide Dynamic Execution
- Intel Intelligent Power Capability
- Intel Advanced Smart Cache
- Intel Smart Memory Access
- Intel Advanced Digital Media Boost

NetBurst Core
micro-architecture micro-architecture
ƒ Pentium-based and Celeron-based ƒ Celeron (2007 and after)
processors (2000-2006) ƒ Pentium Dual-Core (2007 and after)
ƒ Core Solo processor (2006) ƒ Core 2 Duo (2006 and after)
ƒ Core Duo processor (2006) ƒ Core 2 Extreme (2006 and after)
ƒ Core 2 Quad (2007 and after)

© 2008 Lenovo

Intel Core Micro-architecture


The Intel Core micro-architecture was announced in July 2006 as the foundation for Intel
architecture-based desktop, mobile, and mainstream server multi-core processors. This state-of-the-
art multi-core optimized and power-efficient micro-architecture is designed to deliver increased
performance and performance-per-watt – thus increasing overall energy efficiency. It incorporates
many new and significant innovations designed to optimize the power, performance, and scalability
of multi-core processors.
The following processors were announced in July 2006 and after with Intel Core micro-
architecture:
• Intel Core 2 Solo Processor (notebook)
• Intel Core 2 Duo Processor (desktop and notebook)
• Intel Core 2 Extreme Processor
• Intel Core 2 Quad Processor (desktop)
• (2007) Intel Celeron Processor (desktop and notebook)
• Intel Celeron Dual-Core Processor (desktop)
• (2007) Intel Pentium Dual-Core Processor (desktop and notebook))
Key features of Intel Core micro-architecture include Intel Wide Dynamic Execution, Intel
Intelligent Power Capability, Intel Advanced Smart Cache, Intel Smart Memory Access, and Intel
Advanced Digital Media Boost.

PC Architecture (TXW102) September 2008 41


Topic 2 - Processor Architecture

Intel Wide Dynamic Execution


Intel Wide Dynamic Execution can deliver more instructions per clock cycle, improving execution
and energy efficiency. Every execution core is wider, allowing each core to fetch, dispatch, execute,
and retire up to four full instructions simultaneously using an efficient 14-stage pipeline.

Without Intel Wide Dynamic Execution

1 1 1 1
Fetch 2 Decode 2 Queue 2 Execute 2 Retire
3 3 3 3

1
With Intel Wide Dynamic Execution

1 1 1 1
2 2 2 2
Fetch Decode Queue Execute Retire
3 3 3 3
4 4 4 4

Wider execution cores allow each core to fetch, dispatch, execute,


and retire up to four full instructions simultaneously.

Further efficiencies include more accurate branch prediction, deeper instruction buffers for greater
execution flexibility, and other features that reduce the number of required execution cycles.
For example, Macro-Fusion is a new feature that reduces execution cycles within each execution
core. In previous generation processors, each incoming program instruction was individually
decoded and executed. Macro-Fusion enables common pairs of these instructions to be combined
into a single instruction during decoding, and then subsequently executed as a single instruction.
This reduces the total number of executed instructions, allowing the processor to process more
instructions in less time, all while using less power.

Without Macro-Fusion

Instruction Decoder Execution

With Macro-Fusion
Instruction Decoder Execution

Combine 11

With Macro-Fusion: the processor executes more


instructions in less time while using less power.

PC Architecture (TXW102) September 2008 42


Topic 2 - Processor Architecture

Each Core Core 1 Core 2

Efficient 14 Instruction Fetch Instruction Fetch


Stage Pipeline and Pre-decode and Pre-decode

Deeper Buffers Instruction Queue Instruction Queue

4 Wide-Decode Decode Decode


to Execute

Rename/Alloc Rename/Alloc
4 Wide-Micro-Op
to Execute

Retirement Unit Retirement Unit


(Reorder Buffer) (Reorder Buffer)
Micro and
Macro Fusion

Schedulers Schedulers

Enhanced ALUs

Execute Execute

Advantage
Performance
ƒ Wider execution
ƒ Comprehensive advancements
Energy ƒ Enabled in each core

With the Intel Wide Dynamic Execution of the Intel Core micro-architecture, every execution core in a
multi-core processor is wider. This allows each core to fetch, dispatch, execute, and return up to
four full instructions simultaneously.

PC Architecture (TXW102) September 2008 43


Topic 2 - Processor Architecture

Intel Intelligent Power Capability


Intel Intelligent Power Capability includes features that further reduce power consumption. For
example, Intel Core microarchitecture uses advanced power gating to take advantage of the micro-
architecture’s ultra-fine grained logic control. This feature intelligently turns on only the individual
logic subsystems that are currently required and the finer granularity helps to minimize subsystems
that require power. Intel Intelligent Power Capability optimizes energy usage, delivering more
performance per watt for PCs.

Darkened regions represent


minimal power consumption

Without Intel Intelligent Power Capability With Intel Intelligent Power Capability

Intel Advanced Smart Cache


Intel Advanced Smart Cache increases the probability that each execution core can access data from
the faster, more efficient cache subsystem. Unlike multi-core implementations that equally split the
amount of L2 cache among execution cores, often greatly underutilizing this valuable resource, Intel
Advanced Smart Cache allows each core to dynamically utilize up to 100% of available L2 cache
while also allowing each core to obtain the data from the cache at higher throughput rates (as
compared to previous Intel generation Smart Cache). For example, if one core has minimal cache
requirements, the other core can dynamically increase its proportion of L2 cache. This reduces
cache misses, resulting in decreased latency to the required data, which increases performance. This
improves processor efficiency, increasing absolute performance and performance per watt.

Without Intel Advanced Smart Cache


Core 1

L2 Cache RAM
Core 2

With Intel Advanced Smart Cache


Core 1

RAM

Core 2
L2 Cache

Intel Advanced Smart Cache allows each core to dynamically utilize up to 100% of
available L2 cache, while obtaining data from the cache at higher throughput rates.

PC Architecture (TXW102) September 2008 44


Topic 2 - Processor Architecture

Shared L2

Decreased Traffic

Dynamically, Bi-Directionally Available

L1 Cache L1 Cache

Core 1 Core 2

Independent L2

Increased Traffic

Not Shareable

L1 Cache L1 Cache

Core 1 Core 2

In a multi-core processor where two cores do not share L2 cache, an idle core also means
idle L2 cache space. This is a critical waste of resources, especially when another core
may be suffering a performance hit because its L2 cache is too full. Intel's shared L2
cache design enables the working core to dynamically take over the entire L2 cache and
maximize performance.

PC Architecture (TXW102) September 2008 45


Topic 2 - Processor Architecture

Intel Smart Memory Access


Intel Smart Memory Access improves system performance by hiding memory latency, thus
optimizing the use of data bandwidth out to the memory subsystem. This includes the ability to
intelligently load or prefetch data in anticipation of the system’s actual need, and doing so during
periods in which the system bus and memory subsystems have spare bandwidth available. It also
includes a new capability called memory disambiguation. Prior implementations without memory
disambiguation required each load instruction desiring to read data in from main memory to wait
until all previous store instructions were completed. With Intel’s innovative memory
disambiguation, execution cores have the built-in intelligence to speculatively load data for
instructions that are about to execute before all previous store instructions are executed. It also
intelligently detects if a conflict occurred between the speculatively loaded data and the previous
stores back to main memory, and in the rare case this occurs, the data is reloaded and the instruction
is re-executed. Memory disambiguation can help avoid wait states imposed by previous generation
micro-architectures, providing better performance through increased parallelism.

F
Without Memory Disambiguation
Execution Retire

4 3
Instruction Sequence
3

With Memory Disambiguation


Execution Retire

I H
5 4
Instruction Sequence
4

Memory disambiguation improves performance through increased parallelism.

PC Architecture (TXW102) September 2008 46


Topic 2 - Processor Architecture

Intel Advanced Digital Media Boost


Intel Advanced Digital Media Boost is a feature that significantly improves performance when
executing Streaming SIMD Extension instructions, also known as SSE, SSE2, and SSE3
instructions. On many previous generation processors, 128-bit SSE, SSE2, and SSE3 instructions
were executed at a sustained rate of one complete instruction every two clock cycles, the lower 64
bits in one cycle and the upper 64 bits in the next. The Intel Advanced Digital Media Boost feature
enables these 128-bit instructions to be completely executed at a throughput rate of one per clock
cycle, effectively doubling the speed of execution for these instructions. This greatly improves
performance for many important multimedia operations and when processing other rich data sets.

Without Intel Advanced Digital Media Boost


Execution

64 64
128 bits 128 bits
bits bits

With Intel Advanced Digital Media Boost


Execution

128 bits 128 bits 128 bits 128 bits 128 bits

Intel Advanced Digital Media Boost enables 128-bit instructions to be executed at


one per clock cycle – effectively doubling the speed of execution.

127 0
Source
X4 X3 X2 X1
SSE/2/3 OP With Intel Single Cycle
Y4 Y3 Y2 Y1 SSE, 128-bit instructions
Dest can be completely
executed at a throughput
Intel Core
rate of one per clock cycle,
Microarchitecture
effectively doubling the
Clock Cycle 1 X4opY4 X3opY3 X2opY2 X1opY1 speed of execution for
these instructions.
Not Intel Core
Clock Cycle 1 X2opY2 X2opY2
Microarchitecture
Clock Cycle 2 X4opY4 X3opY3

Advantage
Performance
ƒ Increased performance
ƒ 128-bit single cycle in each core
Energy ƒ Improved energy efficiency

PC Architecture (TXW102) September 2008 47


Topic 2 - Processor Architecture

Processor Packaging

Desktop systems
• Flip-Chip Land Grid Array (FC-LGA)
- LGA775 socket for 775-land Flip-Chip LGA
package (FC-LGA4, FC-LGA6, FC-LGA8)
Notebook systems
• Micro Ball Grid Array (Micro BGA) Flip-Clip Land Grid
Array (FC-LGA)
• Micro Pin Grid Array (Micro PGA)

Micro Ball Grid Array

© 2008 Lenovo

Processor Packaging
Processors implement different types of packaging, depending on the requirements of the system.
Pin Grid Array (PGA) – Most early Celeron, Pentium III, and Pentium 4 processors in desktops use
PGA packaging. PGA is a type of chip in which the connecting pins are located on the bottom in
concentric squares. The early Pentium 4 used a 423-pin socket with PGA package technology,
which is called PGA423 or PGA423 socket.

Socket for PGA423 Pentium 4 mPGA478B socket for Pentium 4

PC Architecture (TXW102) September 2008 48


Topic 2 - Processor Architecture

Flip-Chip PGA4 (FC-PGA4) – The Flip-Chip Pin Grid Array 4 (FC-PGA4) package was
introduced in February 2004 with the Pentium 4 code-named Prescott. The package is also used
with the Intel Celeron D Processor 3xx. This processor uses a 90 nm process consisting of a
processor core mounted on a pinned substrate with an integrated heat spreader. This packaging
employs a 1.27 mm [0.05 in] pitch for the substrate pins.
Land Grid Array – The Pentium 4 Processor 5xx and 6xx and Pentium D Processor 8xx in the 775-
land package are packaged in a Flip-Chip Land Grid Array (FC-LGA4) package that interfaces
with the systemboard via an LGA775 socket. In January 2006, the FC-LGA6 was introduced as a
follow on to the FC-LGA4. The FC-LGA6 is used on the Pentium 4 Processor 6x1, Pentium D 9xx,
Core 2 Duo, and Core 2 Quad. The LGA775 is also called Socket T. In January 2008, the FC-
LGA8 was introduced as a follow on to the FC-LGA6; it was first used in the Core 2 Duo
Processor (Wolfdale). The package consists of a processor core mounted on a substrate land-
carrier. An integrated heat spreader (IHS) is attached to the package substrate and core and serves
as the mating surface for processor component thermal solutions, such as a heatsink. The figure
below shows a sketch of the processor package components and how they are assembled.

Flip-Chip Land Grid Array (FC-LGA4)


Package of Pentium 4 (bottom)

Components of Flip-Chip Land Grid Array


(FC-LGA4) Package of Pentium 4 for the
Flip-Chip Land Grid Array (FC-LGA4) LGA775 socket
Package of Pentium 4 (top)

PC Architecture (TXW102) September 2008 49


Topic 2 - Processor Architecture

FC-LGA4 Package
Core (die) Integrated Heat
Spreader
Substrate Thermal Interface
Material

Capacitors

Systemboard
LGA775 Socket

FC-LGA4 package with LGA775 socket

Close-up view of contacts of the LGA775


socket

LGA775 socket

Micro Pin Grid Array (Micro PGA) or Micro Flip Chip PGA (Micro-FCPGA) – Introduced by
Intel in May 1999, this processor packaging uses pins that insert into a socket. It is used for
notebook-based processors. The Micro PGA processor packaging type is slightly larger than a
postage stamp and consists of the processor and a tiny socket, which combined measure only 32 by
37 millimeters in dimension and less than six millimeters in height. The pins on the package are
only 1.25 millimeters long (about the thickness of a dime), making it Intel's smallest "pinned"
processor package. With the introduction of the 0.13 micron Mobile Pentium III-M (Tualatin) in
July 2001, this packaging migrated to a new technology called Micro Flip Chip Pin Grid Array
(Micro-FCPGA), but still consisted of inserting pins into a socket. Flip chip describes the method
of electrically connecting the die to the package carrier. The Celeron M, Pentium M, Core, and
Core 2 Duo processors use the newer Micro-FCPGA.

PC Architecture (TXW102) September 2008 50


Topic 2 - Processor Architecture

Micro-Flip Chip Pin Grid Array socket on ThinkPad systemboard


(red box)

Microprocessor controlled heat pipe (red box) on Lenovo ThinkPad


notebook provides super efficient heat transfer from processor and
minimizes fan use, resulting in long battery life. This technology also keeps
noise levels low.

PC Architecture (TXW102) September 2008 51


Topic 2 - Processor Architecture

Ball Grid Array (BGA) or Micro Flip Chip BGA (Micro-FCBGA) – Ball Grid Array is a packaging
type for extremely small devices that are soldered to a larger board. It is commonly used in
notebook-based systems. Ball Grid Array packaging replaces pins with solder balls on the
underside for mounting. It is less than a tenth of an inch high and weighs less than a nickel. BGAs
are available in a variety of types ranging from plastic over molded BGAs called PBGAs, to flex
tape BGAs (TBGAs), high thermal metal top BGAs with low profiles (HL-PBGAs), and high
thermal (H-PBGAs). With the introduction of the 0.13 micron Mobile Pentium III-M (Tualatin) in
July 2001, this packaging migrated to a new technology called Micro Flip Chip Ball Grid Array
(Micro-FCBGA). Flip chip describes the method of electrically connecting the die to the carrier
package, but still consists of solder balls on the underside for mounting. The Celeron M, Pentium
M, and Core processors use the newer Micro-FCPGA.

Micro Flip Chip BGA (front)

Micro Flip Chip PGA (front)

Ball Grid Array

Micro Flip Chip BGA (back)


Micro Flip Chip PGA (back)

Microprocessor
controlled heat pipe
provides super
efficient heat transfer
from CPU and
minimizes fan use,
resulting in long
battery life. This
technology also keeps Processor die
noise levels low.
Underfill

Solder balls
Ball Grid Array with Die Up Cross-Section

PC Architecture (TXW102) September 2008 52


Topic 2 - Processor Architecture

Historical Intel Sockets

Socket Pins Voltage Description


Socket 1 169-pin 5.0 volts 17 x 17 PGA 486 only
Socket 2 238-pin 5.0 volts 19 x 19 PGA 486 and P24T
Socket 3 237-pin 3.3 or 5.0 volts 19 x 19 PGA 486 and P24T
Socket 4 273-pin 5.0 volts 21 x 21 PGA Pentium P5
Socket 5 320-pin 3.3 volts 37 x 37 PGA Pentium P54C
Socket 6 235-pin 3.3 volts 19 x 19 PGA 486 and P24T
Socket 7 321-pin 3.3 volts Pentium P54CS
Socket 8 387-pin 2.1 to 3.5 volts Asymmetric socket (for Pentium Pro)
Socket 370 PPGA Celeron (PGA370) or flip-chip
370-pin 1.5 to 2.1 volts
or PGA 370 PGA of Pentium III
PGA 423
423-pin 1.0 to 1.85 volts Pentium 4 (Willamette) with RDRAM
(or Socket-W)
mPGA478B Pentium 4 (Willamette, Northwood, or Prescott) and
478-pin
(or Socket-N) Celeron D Processor 3xx
LGA771 771-pin Intel Core 2 Extreme QX9775
Pentium 4 Processor 5xx (Prescott) and 6xx
Pentium D Processor 8xx (Smithfield)
LGA775 and 9xx (Presler)
775-ball
(or Socket T) Celeron 4xx, Celeron Dual-Core
Pentium Dual-Core, Core 2 Duo, Core 2 Quad,
Core 2 Extreme
Mobile products; PGA/BGA; Core Duo and
Socket M
Core 2 Duo processors
Mobile products; PGA/BGA; Core 2 Duo
Socket P processors; introduced in May 2007 with
Santa Rosa platform for 800 MHz bus support
LGA1366
Nehalem-based
(or Socket B)
LGA1160
Nehalem-based
(or Socket H)

PC Architecture (TXW102) September 2008 53


Topic 2 - Processor Architecture

Intel Desktop Processors:


Desktop Processors
Intel Celeron Processor
Value

Intel Celeron Dual-Core Processor

Intel Pentium Dual-Core Processor

Performance

Intel Core 2 Duo Processor

Intel Core 2 Quad Processor

High Performance
Intel Core 2 Extreme Processor

© 2008 Lenovo

Desktop Processors
This slide shows the Intel desktop processors and chipsets.

Select ThinkCentre desktops use Intel desktop processors

PC Architecture (TXW102) September 2008 54


Topic 2 - Processor Architecture

Industry Standards:
Intel Core 2 Processor with Viiv Technology

• Intel brand name for desktops designed for digital entertainment


in the home
• Desktops that meet specific criteria (hardware and software) receive
this branding
• Desktop platform only (not notebooks)
• Features:
- Instant on/off
- Simple navigation via remote control
- Smaller and quieter systems
• Many companies offer content services and products with Viiv logo
• Lenovo does not have any products utilizing this branding

© 2008 Lenovo

Intel Core 2 Processor with Viiv Technology


In January 2006, Intel announced the brand of Intel Viiv technology. In January 2008, the brand
was renamed Intel Core 2 Processor with Viiv Technology. Desktop PCs that meet specific criteria
can receive this branding; the criteria includes specific hardware and software requirements.
Intel Viiv technology-based PCs feature:
• Consumer electronics-like features for simplified entertainment
– Instant on/off
– Simple navigation to online services with a remote control
– Smaller and quieter systems
• Performance for high definition entertainment
– Intel's latest dual-core processors
– Support for up to 7.1 surround sound
– Support for high definition video including content downloaded from the Internet
Many companies offer content services and software that have been verified to work with this
technology.
See intel.com/products/viiv for more information.

PC Architecture (TXW102) September 2008 55


Topic 2 - Processor Architecture

Intel Desktop Processors:


Intel Core 2 Processor with vPro Technology

• Intel brand name for proactive management and


security of corporate desktop PCs
• Desktop PCs that meet specific criteria
(hardware and software) receive this branding
• Features:
- Isolation and recovery
- Agent presence checking
- Remote diagnosing and repair (even when PC
is powered down or the OS is inoperable)
- Remote configuration
• Select Lenovo ThinkCentre desktops are
compliant with this branding

© 2008 Lenovo

Intel Core 2 Processor with vPro Technology


The Intel Core 2 Processor with vPro Technology identifies desktop products with proactive
management and security. Desktop PCs (does not apply to notebooks) that meet specific criteria
can receive this branding; the criteria includes specific hardware and software requirements.
Desktop PCs with this brand include built-in, hardware-based capabilities that can allow remote
management, maintenance, and update of PCs that have traditionally been inaccessible from the IT
management console. A console can remotely communicate with these PCs, even if system power
is off, the operating system is inoperative, or software agents are not yet installed. The technology’s
primary feature is the latest version of Intel Active Management Technology (Intel AMT version
3).
In the fall of 2007, Intel released a second generation version of vPro with codename Weybridge.
This version of Intel vPro processor technology offers the same great hardware-based features as
before:
• Isolation and recovery
• Agent presence checking
• Ability to remotely diagnose and repair a PC (even when it’s powered down or the OS is
inoperable)

PC Architecture (TXW102) September 2008 56


Topic 2 - Processor Architecture

These features are also available in Intel Centrino 2 with vPro Technology for notebooks. But this
second generation Intel vPro technology also introduced the following:
• Improved system defense filters which can identify greater numbers and varieties of threats in the
network traffic flow.
• An embedded trust agent, the first certified by Cisco, providing the industry's only 802.1x
compatible manageability solution not dependent on OS-availability. This trust agent offers
Cisco's IT customers the ability to manage systems, even if powered off or the OS is down,
without lowering the security on 802.1x networks and Cisco Self-Defending Network products.
• An industry-leading foundation for Windows Vista
• Convenient remote configuration
– More convenient option for over-the-wire set-up
– Allows transfer of Intel AMT keys over the network during set-up
• Next-generation management standards (Web Services Management (WS-MAN and DASH)
– More capable, extensible and secure than Alert Standard Format (ASF)
– Standardizes management between console and PC and inside PC
– Built using draft-DASH standards. DASH (Desktop and Mobile Architecture for System
Hardware) is a web services-based management technology that enables IT professionals to
remotely manage desktop and mobile PCs from anywhere in the world, securely turn the power
on/off, query system inventory, and push firmware updates among other things, regardless of
the state of the remote PC.
• Intel Trusted Execution Technology (TXT) and Intel Virtual Technology for Directed I/O (VT-d)
– Hardware rooted security architecture and foundation for ISV solutions
– Initial enabling around trusted virtual appliances
– Significantly reduces risk of “hyper-jacking” or rootkit attacks
• Energy-efficient performance
– Performance leadership with quad-core processors and faster front side bus
– Lower CPU idle and chipset power
– Improved graphics performance
See developer.intel.com/products/vpro/index.htm for more information.

PC Architecture (TXW102) September 2008 57


Topic 2 - Processor Architecture

Intel Desktop Processors:


Intel Celeron Processor
• Intel 32-bit/64-bit processors for value
desktop systems
• Core micro-architecture
• Key features
- Single-core
- 512 KB L2 cache
- Intel 64 Technology
- 800 MHz system bus
• Code-name Conroe-L
• Introduced June 2007
• Used in some Lenovo desktops

Intel Celeron Processor is used in


select Lenovo ThinkCentre desktops

© 2008 Lenovo

Intel Celeron Processor


In June 2007, Intel announced the Intel Celeron Processor which had the code-name Conroe-L.
It is based on the Core micro-architecture so has similar internal architectural features as the
desktop and mobile-based Core 2 Duo Processor; however, the big difference is that the Intel
Celeron Processor is a single-core processor, not a dual-core processor.
This processor is positioned for value desktop systems.
The Intel Celeron Processor uses a single core on a single die. It uses the Core micro-
architecture that is common across desktop, mobile (called Core 2 Duo mobile processor), and
server platforms. The processor maintains compatibility with IA-32 software, yet it supports
Intel 64 Technology which is an extension to the IA-32 instruction set which adds 64-bit
extensions to the x86 architecture.
It shares the same logo as the Celeron processor for mobile systems. Intel is phasing out the
Celeron D Processor by removing the “D” and using the same name for both mobile and
desktop Celeron processors.

PC Architecture (TXW102) September 2008 58


Topic 2 - Processor Architecture

Features
Key features of the Intel Celeron desktop processor includes the following:
• Core micro-architecture.
• 0.065 micron technology (or 65 nanometers).
• Single-core.
• 64 KB L1 cache (32 KB L1 instruction cache; 32 KB L1 write-back data cache).
• On-die 512 KB L2 cache. The L2 cache runs at the core processor speed. It is 8-way set
associative using a 64 byte cache line.
• No L3 cache.
• 800 MHz system bus which transfers data four times per bus clock (4X) with a double-clocked
address bus (2X). The system bus has a 64-bit data path.
• Intel Smart Memory Access optimizes the use of the data bandwidth from the memory subsystem
to accelerate out-of-order execution.
• Intel 64 Technology support providing 64-bit operating systems and application support.
• No support for Intel Virtualization Technology.
• Intel Advanced Digital Media Boost which accelerates execution of Steaming SIMD Extension
(SSE) instructions used in multimedia and graphics applications. The 128-bit SSE instructions are
issued at a throughput rate of one per clock cycle.
• Execute Disable Bit support.
• No Hyper-Threading Technology support.
• Four total execution units:
– 1 integer units [often called Arithmetic Logic Units (ALU)]
– 1 floating point unit
– 1 load unit
– 1 store unit
• Uses a 775-land Flip-Chip Land Grid Array (FC-LGA6) package requiring an LGA775 socket
(the mobile version uses different packaging).

PC Architecture (TXW102) September 2008 59


Topic 2 - Processor Architecture

Intel Desktop Processors:


Intel Celeron Processor 4xx Versions (Conroe-L)

Celeron
420, 430, 440, 450
(Conroe-L)
1.60, 1.80, 2.00, 2.20 GHz
Single-core
512 KB L2 cache
800 MHz system bus
No Virtualization Technology
Intel 64 and Execute Disable Bit
June 2007 and after

• Intel 3 Series Express Chipset family


chipset support

Celeron desktop processor


without heatsink
in LGA775 socket

© 2008 Lenovo

Celeron Processor Versions (Conroe-L)


In June 2007, Intel announced the Intel Celeron desktop processor. The processors have a
single-core, 512 KB L2 cache, Intel 64 Technology, and Execute Disable Bit (XD).
These processors use a 775-land Flip-Chip Land Grid Array (FC-LGA6) package requiring an
LGA775 socket. These processors are supported by the Intel 3 Series Express chipset family,
Intel 945 and 965 Express Chipset family, and other compatible chipsets.

PC Architecture (TXW102) September 2008 60


Topic 2 - Processor Architecture

Main memory
400, 533, or
800 MHz
64-bit
Pin Grid Array

256 KB, 512 KB, 1 MB, or 2 MB L2 cache

256-bit Full speed

8 or 16 KB data cache Execution trace cache

FPU Load Store Intr/MMX Intr/MMX


Core (on-die)

Processor structure of Pentium 4

System Bus
Frequently used paths
Less frequently used paths

Bus Unit

2nd Level Cache 1st Level Cache


On-die, 8-Way 4-way, low latency

Front End

Fetch/ Execution
Trace Cache Execution Retirement
Decode
Microcode ROM

Branch History Update


BTB/Branch Prediction

The Intel NetBurst Architecture

PC Architecture (TXW102) September 2008 61


Topic 2 - Processor Architecture

Intel Desktop Processors:


Intel Celeron Dual-Core Processor
• Intel 32-bit/64-bit processors for
desktop systems
• Core micro-architecture
• Key features
- Dual-core
- 512 KB shared L2 cache
- Intel 64 Technology
- 800 MHz system bus
• Code-name Conroe (January 2008)
• Positioned for value desktops

© 2008 Lenovo

Intel Celeron Dual-Core Processor


In January 2008, Intel announced the Intel Celeron Dual-Core Processor which had the code-
name Conroe. It is based on the Core micro-architecture so has the same internal architectural
features as the desktop and mobile-based Core 2 Duo Processor.
This processor is positioned for value desktop systems.
The Intel Celeron Dual-Core Processor uses two independent cores on a single die (called a
dual-core processor). It uses the Core micro-architecture that is common across desktop,
notebook (called Core 2 Duo mobile processor), and server platforms. The processor maintains
compatibility with IA-32 software, yet it supports Intel 64 Technology which is an extension to
the IA-32 instruction set which adds 64-bit extensions to the x86 architecture.

PC Architecture (TXW102) September 2008 62


Topic 2 - Processor Architecture

Features
Key features of the Intel Celeron Dual-Core desktop processor includes the following:
• Core micro-architecture.
• 0.065 micron technology (or 65 nanometers).
• Dual-core processing using two independent cores in one physical package that run at the same
frequency.
• 64 KB L1 cache per core (32 KB L1 instruction cache per core; 32 KB L1 write-back data cache
per core).
• Shared on-die 512 KB L2 cache called Intel Advanced Smart Cache. The L2 cache runs at the
core processor speed. It is 8-way set associative using a 64 byte cache line. Intel Advanced Smart
Cache enables the active execution core to access the full L2 cache when the other execution core
is idle.
• No L3 cache.
• 800 MHz system bus which transfers data four times per bus clock (4X) with a double-clocked
address bus (2X). The system bus has a 64-bit data path.
• Intel Wide Dynamic Execution improves execution speed and efficiency with each core
completing up to four full instructions simultaneously with a 14-stage pipeline. The Core 2 Duo
supports micro-op fusion when an x86 instruction is decoded into micro-ops and two adjacent,
dependent micro-ops combine into a single micro-op and execute in a single cycle. A new feature
is "Macro-op fusion" which means certain x86 instructions may also be paired into a single
instruction, then executed in a single cycle. In certain cases, five instructions can be read from the
instruction queue, then executed as if only four instructions were issued.
• Intel Smart Memory Access optimizes the use of the data bandwidth from the memory subsystem
to accelerate out-of-order execution.
• Intel 64 Technology support providing 64-bit operating systems and application support.

Core1 Core
2

Bus

L2 Cache

Die of Intel Celeron Dual-Core Processor

PC Architecture (TXW102) September 2008 63


Topic 2 - Processor Architecture

• No support for Intel Virtualization Technology.


• Intel Advanced Digital Media Boost which accelerates execution of Steaming SIMD Extension
(SSE) instructions used in multimedia and graphics applications. The 128-bit SSE instructions are
issued at a throughput rate of one per clock cycle.
• Execute Disable Bit support.
• No Hyper-Threading Technology support.
• Five total execution units:
– 2 integer units [often called Arithmetic Logic Units (ALU)]
– 1 floating point unit
– 1 load unit
– 1 store unit
• Uses a 775-land Flip-Chip Land Grid Array (FC-LGA6) package requiring an LGA775 socket
(the mobile version uses different packaging).

System Bus

Instruction Fetch Instruction Fetch


and PreDecode and PreDecode
L2 Cache and Control

Instruction Queue Instruction Queue uCode


uCode
ROM

ROM

Decode Decode

Rename/Alloc Rename/Alloc

Reorder Buffer Reorder Buffer


Retirement Unit Retirement Unit

Schedulers Schedulers

FPU ALU ALU Load Store Store Load ALU ALU FPU

L1 D-Cache and D-TLB L1 D-Cache and D-TLB

Block Diagram for Intel Celeron Dual-Core Architecture

PC Architecture (TXW102) September 2008 64


Topic 2 - Processor Architecture

Intel Desktop Processors:


Intel Celeron Dual-Core Processor Versions (Conroe)

Celeron Dual-Core
E1200, E1400
(Conroe)
1.60, 2.00 GHz
Dual-core
512 KB Shared L2 cache
800 MHz system bus
No Virtualization Technology
Intel 64 and Execute Disable Bit
January 2008 and after

• Intel G31 Express family


chipset support

Celeron Dual-Core desktop processor


without heatsink
in LGA775 socket

© 2008 Lenovo

Celeron Dual-Core Processor Versions (Conroe)


In January 2008, Intel announced the Intel Celeron Dual-Core desktop processor. All the
processors support shared L2 cache, Enhanced Intel SpeedStep Technology, Intel 64
Technology, and Execute Disable Bit (XD).
These processors use a 775-land Flip-Chip Land Grid Array (FC-LGA6) package requiring an
LGA775 socket. These processors are supported by the Intel G31 Express chipset family.

Execution Execution
Core Core

L1 Cache L1 Cache

L2 Cache Control

L2 Cache

Memory Controller Hub

Processor Structure of
Intel Celeron Dual-Core Processor

PC Architecture (TXW102) September 2008 65


Topic 2 - Processor Architecture

Intel Desktop Processors:


Intel Pentium Dual-Core Processor
• Intel 32-bit/64-bit processors for
desktop systems
• Core micro-architecture
• Key features
- Dual-core
- 1 MB or 2 MB shared L2 cache
- Intel 64 Technology
- 800 MHz system bus
• Code-name Conroe (June 2007)
and Wolfdale (August 2008)
• Positioned for value desktops
• Used in some Lenovo desktops

Intel Pentium Dual-Core Processor


is used in select Lenovo
ThinkCentre desktops
© 2008 Lenovo

Intel Pentium Dual-Core Processor


In June 2007, Intel announced the Intel Pentium Dual-Core Processor which had the code-name
Conroe. Later versions were announced with code-name Wolfdale. It is based on the Core
micro-architecture so has the same internal architectural features as the desktop and mobile-
based Core 2 Duo Processor.
This processor is positioned for value desktop systems.
The Intel Pentium Dual-Core Processor uses two independent cores on a single die (called a
dual-core processor). It uses the Core micro-architecture that is common across desktop,
notebook (called Core 2 Duo mobile processor), and server platforms. The processor maintains
compatibility with IA-32 software, yet it supports Intel 64 Technology which is an extension to
the IA-32 instruction set which adds 64-bit extensions to the x86 architecture.

PC Architecture (TXW102) September 2008 66


Topic 2 - Processor Architecture

Features
Key features of the Intel Pentium Dual-Core desktop processor includes the following:
• Core micro-architecture.
• 0.065 micron technology (or 65 nanometers).
• Dual-core processing using two independent cores in one physical package that run at the same
frequency.
• 64 KB L1 cache per core (32 KB L1 instruction cache per core; 32 KB L1 write-back data cache
per core).
• Shared on-die 1 MB or 2 MB L2 cache called Intel Advanced Smart Cache. The L2 cache runs at
the core processor speed. It is 8-way set associative using a 64 byte cache line. Intel Advanced
Smart Cache enables the active execution core to access the full L2 cache when the other
execution core is idle.
• No L3 cache.
• 800 MHz system bus which transfers data four times per bus clock (4X) with a double-clocked
address bus (2X). The system bus has a 64-bit data path.
• Intel Wide Dynamic Execution improves execution speed and efficiency with each core
completing up to four full instructions simultaneously with a 14-stage pipeline. The Core 2 Duo
supports micro-op fusion when an x86 instruction is decoded into micro-ops and two adjacent,
dependent micro-ops combine into a single micro-op and execute in a single cycle. A new feature
is "Macro-op fusion" which means certain x86 instructions may also be paired into a single
instruction, then executed in a single cycle. In certain cases, five instructions can be read from the
instruction queue, then executed as if only four instructions were issued.
• Intel Smart Memory Access optimizes the use of the data bandwidth from the memory subsystem
to accelerate out-of-order execution.
• Intel 64 Technology support providing 64-bit operating systems and application support.

Core1 Core
2

Bus

L2 Cache

Die of Intel Pentium Dual-Core Processor

PC Architecture (TXW102) September 2008 67


Topic 2 - Processor Architecture

• No support for Intel Virtualization Technology.


• Intel Advanced Digital Media Boost which accelerates execution of Steaming SIMD Extension
(SSE) instructions used in multimedia and graphics applications. The 128-bit SSE instructions are
issued at a throughput rate of one per clock cycle.
• Execute Disable Bit support.
• No Hyper-Threading Technology support.
• Five total execution units:
– 2 integer units [often called Arithmetic Logic Units (ALU)]
– 1 floating point unit
– 1 load unit
– 1 store unit
• Uses a 775-land Flip-Chip Land Grid Array (FC-LGA6) package requiring an LGA775 socket
(the mobile version uses different packaging).

System Bus

Instruction Fetch Instruction Fetch


and PreDecode and PreDecode
L2 Cache and Control

Instruction Queue Instruction Queue uCode


uCode
ROM

ROM

Decode Decode

Rename/Alloc Rename/Alloc

Reorder Buffer Reorder Buffer


Retirement Unit Retirement Unit

Schedulers Schedulers

FPU ALU ALU Load Store Store Load ALU ALU FPU

L1 D-Cache and D-TLB L1 D-Cache and D-TLB

Block Diagram for Intel Pentium Dual-Core Architecture

PC Architecture (TXW102) September 2008 68


Topic 2 - Processor Architecture

Intel Desktop Processors:


Intel Pentium Dual-Core Processor Versions (Conroe and Wolfdale)

Pentium Dual-Core Pentium Dual-Core


E2140, E2160, E2180, E2200, E2220 E5200
(Conroe) (Wolfdale)
1.60, 1.80, 2.00, 2.20, 2.40 GHz 2.5 GHz
Dual-core Dual-core
1 MB shared L2 cache 2 MB shared L2 cache
800 MHz system bus 800 MHz system bus
No Virtualization Technology No Virtualization Technology
Intel 64 and Execute Disable Bit Intel 64 and Execute Disable Bit
June 2007 and after August 2008

• Intel G31 and Intel 4 Series Express


family chipset support
Pentium Dual-Core
desktop processor
without heatsink
in LGA775 socket

© 2008 Lenovo

Pentium Dual-Core Processor Versions (Conroe, Wolfdale)


In June 2007, Intel announced the Intel Pentium Dual-Core desktop processor. All the
processors support shared L2 cache, Enhanced Intel SpeedStep Technology, Intel 64
Technology, and Execute Disable Bit (XD).
These processors use a 775-land Flip-Chip Land Grid Array (FC-LGA6) package requiring an
LGA775 socket. These processors are supported by the Intel G31 Express chipset and Intel 4
Series Express chipset families.

Execution Execution
Core Core

L1 Cache L1 Cache

L2 Cache Control

L2 Cache

Memory Controller Hub

Processor Structure of
Intel Pentium Dual-Core Processor

PC Architecture (TXW102) September 2008 69


Topic 2 - Processor Architecture

Intel Desktop Processors:


Intel Core 2 Duo Desktop Processor
• Intel 32-bit/64-bit processors for
desktop systems
• Core micro-architecture
• Key features
- Dual-core
- 2 MB, 4 MB, or 6 MB shared L2 cache
- Intel 64 Technology
- 800, 1066, or 1333 MHz system bus
- (Some) Trusted Execution Technology
• Code-name Conroe (July 2006)
and Wolfdale (January 2008)
• Used in some Lenovo desktops

Intel Core 2 Duo Desktop Processor


is used in select Lenovo
ThinkCentre desktops
© 2008 Lenovo

Intel Core 2 Duo Desktop Processor


In July 2006, Intel announced the Intel Core 2 Duo desktop processor which had the code-name
Conroe and 65nm process technology. In January 2008, a 45nm version with code-name
Wolfdale was announced. It has the same internal architectural features as the mobile-based
version (with the same name of Core 2 Duo Processor) except for various power-related
features, physical packaging, and other items.
The Core 2 Duo desktop processor uses two independent cores on a single die (called a dual-
core processor). It uses the Core micro-architecture that is common across desktop, mobile
(called Core 2 Duo mobile processor), and server platforms. The Core 2 Duo desktop processor
maintains compatibility with IA-32 software, yet it supports Intel 64 Technology which is an
extension to the IA-32 instruction set which adds 64-bit extensions to the x86 architecture.
For computer users who run multiple demanding applications simultaneously, the Intel Core 2
Duo desktop processor is Intel’s preferred desktop processor, offering exceptional performance
and responsiveness so that users can get the most productivity and enjoyment from their
applications, because it is powered by multiple cores in one processor.

PC Architecture (TXW102) September 2008 70


Topic 2 - Processor Architecture

Features
Key features of the Intel Core 2 Duo desktop processor includes the following:
• Core micro-architecture.
• 65 nanometers (Conroe) or 45 nanometers (Wolfdale) process technology
• Dual-core processing using two independent cores in one physical package that run at the same
frequency.
• 64 KB L1 cache per core (32 KB L1 instruction cache per core; 32 KB L1 write-back data cache
per core).
• Shared on-die 2 MB, 4 MB, or 6 MB L2 cache called Intel Advanced Smart Cache. The L2 cache
runs at the core processor speed. It is 8-way set associative using a 64 byte cache line. Intel
Advanced Smart Cache enables the active execution core to access the full L2 cache when the
other execution core is idle.
• No L3 cache.
• 800, 1067, or 1333 MHz system bus which transfers data four times per bus clock (4X) with a
double-clocked address bus (2X). The system bus has a 64-bit data path.
• Intel Wide Dynamic Execution improves execution speed and efficiency with each core
completing up to four full instructions simultaneously with a 14-stage pipeline. The Core 2 Duo
supports micro-op fusion when an x86 instruction is decoded into micro-ops and two adjacent,
dependent micro-ops combine into a single micro-op and execute in a single cycle. A new feature
is "Macro-op fusion" which means certain x86 instructions may also be paired into a single
instruction, then executed in a single cycle. In certain cases, five instructions can be read from the
instruction queue, then executed as if only four instructions were issued.
• Intel Smart Memory Access optimizes the use of the data bandwidth from the memory subsystem
to accelerate out-of-order execution.
• Intel 64 Technology support providing 64-bit operating systems and application support.
• Some support Intel Trusted Execution Technology.

Core1 Core 2

Bus

L2 Cache

Core 2 Duo desktop processor without heatsink


Die of Intel Core 2 Duo Desktop Processor in LGA775 socket

PC Architecture (TXW102) September 2008 71


Topic 2 - Processor Architecture

• Some versions support Intel Virtualization Technology to allow hardware-based virtualization


(Intel Virtualization Technology requires a system with a processor, chipset, BIOS, virtual
machine monitor (VMM) and applications enabled for Virtualization Technology).
• Intel Advanced Digital Media Boost which accelerates execution of Steaming SIMD Extension
(SSE) instructions used in multimedia and graphics applications. The 128-bit SSE instructions are
issued at a throughput rate of one per clock cycle.
• Execute Disable Bit support.
• No Hyper-Threading Technology support.
• Five total execution units:
– 2 integer units [often called Arithmetic Logic Units (ALU)]
– 1 floating point unit
– 1 load unit
– 1 store unit
• Uses a 775-land Flip-Chip Land Grid Array (FC-LGA6) package requiring an LGA775 socket
(the mobile version uses different packaging).

System Bus

Instruction Fetch Instruction Fetch


and PreDecode and PreDecode
L2 Cache and Control

Instruction Queue Instruction Queue uCode


uCode
ROM

ROM

Decode Decode

Rename/Alloc Rename/Alloc

Reorder Buffer Reorder Buffer


Retirement Unit Retirement Unit

Schedulers Schedulers

FPU ALU ALU Load Store Store Load ALU ALU FPU

L1 D-Cache and D-TLB L1 D-Cache and D-TLB

Block Diagram for Intel Core 2 Duo Architecture

PC Architecture (TXW102) September 2008 72


Topic 2 - Processor Architecture

(Graphics) Memory Controller Hub


chipset (Intel x96x)
800, 1067, or
Flip-Chip 1333 MHz
Land Grid Array 64-bit

Shared L2 cache (Intel Advanced


Smart Cache)
Full
256-bit 256-bit
speed
32 KB data cache 32 KB inst cache 32 KB data cache 32 KB inst cache
One of two
cores FPU Load Store Integer Integer FPU Load Store Integer Integer
(on-die)

Processor Structure of Intel Core 2 Duo Processor

Thermal Control
Thermal Control
Arch State Arch State

Execution Execution Execution Execution


Core Core Resources Resources

L1 Caches L1 Caches

L1 Cache L1 Cache APIC L2 Shared Cache APIC

L2 Cache Control Power Management Core Coordination


Coordination Logic Logic
L2 Cache
Bus Interface

Memory Controller Hub


Memory Controller Hub
Processor Structure of
Intel Core 2 Duo Processor (simplified) Processor Structure of
Intel Core 2 Duo Processor
(detailed)

Die of Intel Core 2 Duo Processor

PC Architecture (TXW102) September 2008 73


Topic 2 - Processor Architecture

The following table shows some key differences between the Pentium D Processor 8xx/9xx and the
Core 2 Duo processor.

Pentium D Processor 8xx/9xx Core 2 Duo Processor


NetBurst micro-architecture Core micro-architecture
Dual-core Dual-core
Two independent 1 MB or 2MB L2 caches Shared 2 MB, 4 MB, or 6MB L2 cache
Intel 64 Technology Intel 64 Technology
31-stage pipeline 14-stage pipeline

Issue/retire 3 instructions per clock Issue/retire 4 instructions per clock

Micro-ops and micro-op fusion;


Micro-ops and micro-op fusion
macro-op fusion
SSE, SSE2, SSE3 (4 floating point SSE, 128-bit SSE2/3, (8 floating point operations
operations per cycle) per cycle)
One SIMD instruction executed in 2 cycles One SIMD instruction executed in 1 cycle

Some support Virtualization Technology Some support Virtualization Technology

800 MHz system bus 800, 1067, or 1333 MHz system bus

FC-LGA4 packaging for LGA775 socket FC-LGA6/LGA8 packaging for LGA775 socket

Core 2 Duo Desktop Processor with G965 chipset

PC Architecture (TXW102) September 2008 74


Topic 2 - Processor Architecture

Intel Core 2 Duo Processor (with heat sink) Intel Core 2 Duo Processor (without heat sink)

LGA775 socket for Intel Core 2 Duo Processor LGA775 socket on Lenovo ThinkCentre systemboard

PC Architecture (TXW102) September 2008 75


Topic 2 - Processor Architecture

Intel Desktop Processors:


Intel Core 2 Duo Desktop Processor Versions (Conroe)

Core 2 Duo
Core 2 Duo Core 2 Duo
E4300, E4400, E4500,
E6300, E6320, E6400, E6420, E6550, E6750, E6850
E4600, E4700
E6540, E6600, E6700 (Conroe) (Conroe)
(Conroe)
1.80, 2.0, 2.2, 2.4, 2.6 GHz 1.86, 2.13, 2.33, 2.40, 2.66 GHz 2.33, 2.66, 3.00 GHz
Dual-core Dual-core Dual-core
2 MB Shared L2 cache 2 MB or 4 MB Shared L2 cache 4 MB Shared L2 cache
800 MHz system bus 1066 or 1333 MHz system bus 1333 MHz system bus
No Virtualization Technology Virtualization Technology Virtualization Technology
Intel 64 and Execute Disable Bit Intel 64 and Execute Disable Bit Intel 64 and Execute Disable Bit
None None Trusted Execution Technology
January 2007 and after July 2006 and after July 2007

• Intel x96x Express family (Q963, Q965, G965, P965)


chipset support

© 2008 Lenovo

Core 2 Duo Desktop Processor Versions (Conroe)


In July 2006, Intel announced the Intel Core 2 Duo desktop processor. All the processors
support shared L2 cache, Intel 64 Technology, and Execute Disable Bit (XD). All run at 65
watts thermal design power.
The E6300 has 2 MB L2 cache and runs at 1.86 GHz; the E6400 has 2 MB L2 cache and runs
at 2.13 GHz; the E6600 has 4 MB L2 cache and runs at 2.40 GHz; the E6700 has 4 MB L2
cache and runs at 2.66 GHz.
In April 2007, the E6320 and E6420 were announced; these are the same as the E6300 and
E6400 except the E6320 and E6420 have 4 MB L2 cache.
In July 2007, the E6540 was announced with a 1333 MHz system bus.
In July 2007, the E6550, E6750, and E6850 was announced with Intel Trusted Execution
Technology.
The E4xxx have similar features to the E6xxx except the E4xxx have 2 MB L2 cache, 800 MHz
system bus, and no Virtualization Technology support.
These processors use a 775-land Flip-Chip Land Grid Array (FC-LGA6) package requiring an
LGA775 socket. These processors are supported by the Intel x96x Express chipset family such
as the Q963, Q965, G965, and P965 Express chipsets.

PC Architecture (TXW102) September 2008 76


Topic 2 - Processor Architecture

Intel Desktop Processors:


Intel Core 2 Duo Desktop Processor Versions (Wolfdale)

Core 2 Duo
Core 2 Duo Core 2 Duo
E8200, E8300, E8400,
E7200, E7300 E8190
E8500, E8600
(Wolfdale) (Wolfdale)
(Wolfdale)
2.53, 2.66 GHz 2.66 GHz 2.66, 3.00, 3.16 GHz
Dual-core Dual-core Dual-core
3 MB Shared L2 cache 6 MB Shared L2 cache 6 MB Shared L2 cache
1066 MHz system bus 1333 MHz system bus 1333 MHz system bus
No Virtualization Technology No Virtualization Technology Virtualization Technology
Intel 64 and Execute Disable Bit Intel 64 and Execute Disable Bit Intel 64 and Execute Disable Bit
None None Trusted Execution Technology
April 2008 and after January 2008 January 2008 and after

• Intel 3 Series Express family (G31, G33, G35,


P35, Q33, Q35, X38, P31, P35) chipset support

© 2008 Lenovo

Core 2 Duo Desktop Processor Versions (Wolfdale)


In January 2008, Intel announced the Intel Core 2 Duo desktop processor with code-name
Wolfdale. All the processors are have dual-core, 3 MB or 6 MB shared L2 cache, Intel 64
Technology, Execute Disable Bit (XD), and Intel HD Boost. All run at 65 watts thermal design
power. Some support Intel Virtualization Technology and Intel Trusted Execution Technology.
These processors are supported by the Intel 3 and 4 Series Express chipset families.
These processors use a 775-land Flip-Chip Land Grid Array (FC-LGA8) package requiring an
LGA775 socket.

PC Architecture (TXW102) September 2008 77


Topic 2 - Processor Architecture

Intel Desktop Processors:


Intel Core 2 Quad Desktop Processor
• Intel 32-bit/64-bit processors for desktop

Execution

Execution
Execution

Execution
systems

Core

Core

Core

Core
• Core micro-architecture
• Key features L1 L1 L1 L1
Cache Cache Cache Cache
- Quad-core
L2 Cache L2 Cache
- Dual-die L2 cache (4, 6, 8, 12 MB total)
- Intel 64 Technology
- 1066 or 1333 MHz system bus
Memory Controller Hub
• Code-name Kentsfield (January 2007)
and Yorkfield (March 2008)

© 2008 Lenovo

Intel Core 2 Quad Desktop Processor


In January 2007, Intel announced the Intel Core 2 Quad desktop processor which had the code-
name Kentsfield (Q6xxx). In March 2008, the Q9xxx with code-name Yorkfield was
announced. In August 2008, the Q8xxx with code-name Yorkfield was announced.
The Core 2 Quad desktop processor uses four independent cores within a single physical
package (called a quad-core processor). It uses the Core micro-architecture that is common
across desktop, mobile (called Core 2 Duo mobile processor), and server platforms. The Core 2
Quad desktop processor maintains compatibility with IA-32 software, yet it supports Intel 64
Technology which is an extension to the IA-32 instruction set which adds 64-bit extensions to
the x86 architecture.
With four execution cores, the Intel Core 2 Quad processor blows through processor-intensive
tasks in demanding multitasking environments and makes the most of highly threaded
applications. The processor is optimized for creating multimedia, running gaming applications,
or running compute-intensive applications at one time.

PC Architecture (TXW102) September 2008 78


Topic 2 - Processor Architecture

Features
Key features of the Intel Core 2 Quad desktop processor include the following:
• Core micro-architecture
• Q6xxx: 65 nanometers; Q8xxx/Q9xxx: 45 nanometers processor technology
• Quad-core processing using four independent cores in one physical package that run at the same
frequency.
• 32 KB L1 cache per core (so 4 x 32 KB L1 cache)
• 4, 6, 8, 12 MB L2 cache via a 2 x 3/4/8 MB shared L2 cache. The processor is actually a dual-
core on dual-die so two cores share a single L2 cache. The L2 cache runs at the core processor
speed and is called Intel Advanced Smart Cache. It is 8-way set associative using a 64 byte cache
line.
• No L3 cache
• 1066 or 1333 MHz system bus which transfers data four times per bus clock (4X) with a double-
clocked address bus (2X). The system bus has a 64-bit data path.
• Intel Wide Dynamic Execution improves execution speed and efficiency with each core
completing up to four full instructions simultaneously with a 14-stage pipeline. The processor
supports micro-op fusion when an x86 instruction is decoded into micro-ops and two adjacent,
dependent micro-ops combine into a single micro-op and execute in a single cycle. It supports
"Macro-op fusion" which means certain x86 instructions may also be paired into a single
instruction, then executed in a single cycle. In certain cases, five instructions can be read from the
instruction queue, then execute as if only four instructions were issued.
• Intel Smart Memory Access optimizes the use of the data bandwidth from the memory subsystem
to accelerate out-of-order execution.
• Intel 64 Technology support providing 64-bit operating systems and application support.
• Q6xxx/Q9xxx: Supports Intel Virtualization Technology to allow hardware-assisted
virtualization (Intel Virtualization Technology requires a system with a processor, chipset, BIOS,
virtual machine monitor (VMM) and applications enabled for Virtualization Technology).
• Intel Advanced Digital Media Boost which accelerates execution of Steaming SIMD Extension
(SSE) instructions used in multimedia and graphics applications. The 128-bit SSE instructions are
issued at a throughput rate of one per clock cycle.
• Execute Disable Bit support.
• Q9xxx: Intel Trusted Execution Technology which enables more secure platforms from software-
based attacks with appropriate software.

PC Architecture (TXW102) September 2008 79


Topic 2 - Processor Architecture

• No Hyper-Threading Technology support.


• Five total execution units per core:
– 2 integer units [often called Arithmetic Logic Units (ALU)]
– 1 floating point unit
– 1 load unit
– 1 store unit
• Uses a 775-land Flip-Chip Land Grid Array (Q6xxx: FC-LGA6; Q8xxx/Q9xxx: FC-LGA8)
package requiring an LGA775 socket.

Execution

Execution
Execution
Execution

Core

Core

Core
Core

L1 L1 L1 L1
Cache Cache Cache Cache

L2 Cache L2 Cache

Memory Controller Hub

Processor Structure of
Intel Core 2 Quad Processor
(Dual-die with two L2 caches)

PC Architecture (TXW102) September 2008 80


Topic 2 - Processor Architecture

Intel Desktop Processors:


Intel Core 2 Quad Desktop Processor Versions (Kentsfield/Yorkfield)
Core 2 Quad Core 2 Quad Core 2 Quad
Q6600, Q6700 Q8200 Q9300, Q9400, Q9450, Q9550, Q9650
(Kentsfield) (Yorkfield) (Yorkfield)
2.40, 2.66 GHz 2.33 GHz 2.50, 2.66, 2.83, 3.00 GHz
Quad-core Quad-core Quad-core
8 MB L2 cache (2 x 4 MB) 4 MB L2 cache (2 x 2 MB) 6 or 12 MB L2 cache (2 x 3/6 MB)
Virtualization Technology No Virtualization Technology Virtualization Technology
No TXT No TXT TXT
January 2007 and after August 2008 March 2008 and after

• Intel 3 Series and 4 Series chipset support

Core 2 Quad desktop processor


Core 2 Quad desktop processor
without heatsink in LGA775 socket
© 2008 Lenovo

Core 2 Quad Desktop Processor Versions (Kentsfield)


In January 2007, Intel announced the Intel Core 2 Quad desktop processor. The Q6600 is
clocked at 2.40 GHz and has a total of 8 MB L2 cache via a 2 x 4 MB shared L2 cache. The
processor supports Virtualization Technology, Intel 64 Technology, and Execute Disable Bit
(XD). The processor uses a 775-land Flip-Chip Land Grid Array (FC-LGA6) package requiring
an LGA775 socket. The processor is supported by the Intel P965 and 975X Express chipset
family. In July 2007, the Intel Core 2 Quad Q6700 at 2.66 GHz was announced.
In March 2008, Intel announced the processor with code-name Yorkfield. It has similar features
as the Kentsfield version except adds support for Intel Trusted Execution Technology, LGA8
support, and Intel HD Boost (SSE4).
In August 2008, Intel announced the Q8200 processor with code-name Yorkfield. It is similar
to the Q9xxx except it has 4 MB L2 cache and no support for Virtualization Technology or
Trusted Execution Technology.

PC Architecture (TXW102) September 2008 81


Topic 2 - Processor Architecture

Intel Notebook Processors:


Notebook Processors

Intel Celeron processor Single- Launched


Balanced level of (Socket-P) core June 2007
mobile processor
Value
technology and
value Intel Pentium Dual-Core Dual- Launched
processor (Merom) core Jan 2007

Intel Core 2 Solo processor Single- Launched


Energy efficient (Merom) core Sept 2007
family of processors
that enable
breakthrough multi-
Performance core performance Intel Core 2 Duo processor Dual- Launched
and quick (Merom/Penryn) core Sept 2007
responsiveness
(when with dual-
core)
Intel Core 2 Quad processor Quad- Launched
(Penryn) core August 2008

Intel’s highest Dual-


High
performance Intel Core 2 Extreme core
Performance Launched
mobile processor processor (Merom/Penryn) or July 2007
quad-
core

© 2008 Lenovo

Notebook Processors
This slide shows positioning of the Intel mobile processors.

Lenovo ThinkPad notebooks use Intel processors.

PC Architecture (TXW102) September 2008 82


Topic 2 - Processor Architecture

Intel Notebook Processors:


Intel Centrino Processor Technology Family

• Intel brand name for the best mobile technologies in notebook PCs
- Performance
- Battery life
- Wireless
- Form factor
- Manageability

• Used in select Lenovo IdeaPad and


ThinkPad notebooks

Select Lenovo IdeaPad and


ThinkPad notebooks
utilize Intel Centrino branding

© 2008 Lenovo

Intel Centrino Processor Technology Family


Centrino is a platform-marketing initiative from Intel. It covers a specific combination of
processor, chipset, wireless and additional components in notebooks systems. Containing Intel’s
best mobile technologies, Intel Centrino features the latest breakthroughs on all five vectors of
mobility: performance, battery life, wireless, form factor, and manageability.
Lenovo markets select IdeaPad and ThinkPad notebooks that utilize Intel Centrino branding.

PC Architecture (TXW102) September 2008 83


Topic 2 - Processor Architecture

Intel Notebook Processors:


Intel Centrino Processor Technology (January 2008)

• Intel brand name for integrated wireless and manageability in notebook PCs
• Requires three chips to receive Centrino branding
- Intel processor, Intel chipset, Intel wireless chip
• Also requires additional features for vPro branding

Intel Centrino Processor • Intel processor (Core 2 only)


Technology • Intel chipset (e.g., GM965, PM965)
Freedom and flexibility to
• Intel wireless chip (WLAN)
work, learn, play, and enjoy
on-the-go with revolutionary
dual-core

Intel Centrino with vPro • Three items above plus


Technology + ICH8M Enhanced
Intel’s best business notebook
technology delivering + Intel ethernet
professional capabilities + Intel Active Management Technology (IAMT)
providing an edge over + Intel AMT firmware
standard commercial PCs

© 2008 Lenovo

Intel Centrino Processor Technology (January 2008


In January 2008, Intel revised the logo and naming to these two brands:
• Intel Centrino Processor Technology (requires Intel processor [Core 2 only, not Celeron or
Pentium], Intel chipset, Intel wireless chip)
• Intel Centrino with vPro Technology (requires three components above plus an Enhanced I/O
Controller Hub (ICH), Intel ethernet, Intel Active Management Technology, and Intel AMT-
supported firmware)

Select Lenovo ThinkPad notebooks are based on


Intel Centrino 2 with vPro Technology

PC Architecture (TXW102) September 2008 84


Topic 2 - Processor Architecture

Intel Notebook Processors:


Intel Centrino Processor Technology Family (Jan and August 2008)

• Intel Centrino transition


• Four Centrino brands as of August 2008

2007 1/1/08 August 2008 Intel Centrino name Platforms

Intel Centrino with vPro


Santa Rosa
Technology

Intel Centrino 2 with vPro


Montevina
Technology

Santa Rosa and Santa


Intel Centrino Processor
Rosa Refresh, Napa and
Technology
Napa Refresh, Sonoma

Intel Centrino 2 Processor


Montevina
Technology

© 2008 Lenovo

Intel Centrino Processor Technology Family (January 2008 and August 2008)

Intel Centrino 2 with July


Dual-Core
vPro technology 2008
Intel's best notebook
technology, delivering
improved capabilities to
work, learn, play, and
enjoy on the go.
Intel Centrino 2 July
Dual-Core
processor technology 2008

Intel Centrino with May


Dual-Core
Freedom and flexibility for vPro technology 2007
work or play with
revolutionary dual-core or
single-core processors.

Intel Centrino Dual-Core May


processor technology Single-Core 2007

PC Architecture (TXW102) September 2008 85


Topic 2 - Processor Architecture

Intel Notebook Processors:


Intel Centrino 2 Processor Technology (August 2008)

• Requirements for Intel Centrino 2 branding introduced August 2008


Intel Centrino 2 Intel Centrino 2
Processor Technology with vPro Technology
Intel Core 2 Duo processor with 1066 MHz bus and
Intel Core 2 Duo processor with 1066 MHz bus and ≥3 MB L2 Cache (Penryn only)
≥3 MB L2 Cache (Penryn only) or
Processor or Intel Core 2 Duo processor ULV with 800 MHz bus and
Intel Core 2 Duo processor ULV with 800 MHz bus and ≥3 MB l2 Cache (Penryn only)
≥3 MB L2 Cache (Penryn only) Both require Intel Virtualization Technology and
Intel Trusted Execution Technology
Intel GM/PM45 Express Chipset (Cantiga) Intel GM/PM45 Express Chipset (Cantiga)
or or
Chipset Intel GM47 Express Chipset (Cantiga) Intel GM47 Express Chipset (Cantiga)
Both require ICHM9 Base or ICH9M Enhanced Both require ICH9M-Enhanced
Intel WiMAX / WiFi Link 5350 (3x3) Intel WiMAX / WiFi Link 5350 (3x3)
or 5150 (1x2) (Echo Peak) or 5150 (1x2) (Echo Peak)
Wireless or or
Intel WiFi Link 5300AGN (3x3) Intel WiFi Link 5300AGN (3x3)
or 5100AGN (1x2) (Shirley Peak) or 5100AGN (1x2) (Shirley Peak)
Intel 82567LM/LF Gigabit Network Connection (Boazman)
LAN/Trusted
N/A and a Trusted Platform Module
Platform Module
(discrete or integrated, complying with TPM 1.2)
Intel Active
Management N/A Version 4.0 for Wired and Wireless LAN
Technology
BIOS N/A Capable: VT-x, VT-d, AMT 4.0, TXT, and TPM 1.2

© 2008 Lenovo

Intel Centrino 2 Processor Technology (August 2008)


A notebook could have Intel Active Management Technology and not receive the vPro
branding.
Intel Core 2 Extreme (both dual-core and quad-core) do not receive the vPro branding because
some security features in the processor are unlocked.

PC Architecture (TXW102) September 2008 86


Topic 2 - Processor Architecture

Intel Notebook Processors:


Notebook Processor Numbers

• Intel processor numbers are designed to quickly differentiate processors in a


product family
• Just introduced 'P' prefix for the Power Optimized Performance segment
• Just introduced 'S' prefix for the Small Form Factor Segment

Processor
Description TDP Range
Class

QX Mobile or Desktop Quad-Core Extreme Performance >40W

X Mobile or Desktop Dual-Core Extreme Performance >40W

T Mobile Highly Energy Efficient 30-39W

P Mobile Power Optimized Energy Efficient higher performance 20-29W

L Mobile Highly Energy Efficient 12-19W

U Mobile Ultra High Energy Efficient ≤11.9W

SP Mobile Small package Power Optimized – Energy Efficient higher performance 20-29W

SL Mobile Small package Highly Energy Efficient 12-19W

SU Mobile Small package Ultra High Energy Efficient ≤11.9W

© 2008 Lenovo

Notebook Processor Numbers


The Intel processor numbers are designed to quickly differentiate processors in a product
family.

PC Architecture (TXW102) September 2008 87


Topic 2 - Processor Architecture

Intel Notebook Processors:


Intel Celeron Processor

• Intel 32-bit/64-bit processors for


value mobile systems
• Core micro-architecture
• Key features
- Single-core
- 533, 667, or 800 MHz system bus
- 1 MB L2 cache
- Intel 64 Technology
- No SpeedStep technology
- No Virtualization Technology
• Introduced June 2007
• Used in some Lenovo notebooks

Intel Celeron Processor


© 2008 Lenovo

Intel Celeron Processor


In June 2007, Intel announced the Intel Celeron Processor which had the code-name Merom.
Later versions had code-name of Penryn. It is based on the Core micro-architecture so has
similar internal architectural features as the desktop and mobile-based Core 2 Duo Processor;
however, the big difference is that the Intel Celeron Processor is a single-core processor, not a
dual-core processor.
This processor is positioned for value mobile systems.
The Intel Celeron Processor uses a single core on a single die. It uses the Core micro-
architecture that is common across desktop, mobile (called Core 2 Duo mobile processor), and
server platforms. The processor maintains compatibility with IA-32 software, yet it supports
Intel 64 Technology which is an extension to the IA-32 instruction set which adds 64-bit
extensions to the x86 architecture.
It shares the same logo as the Celeron processor for desktop systems. Intel is phasing out the
Celeron M Processor by removing the “M” and using the same name for both mobile and
desktop Celeron processors.

PC Architecture (TXW102) September 2008 88


Topic 2 - Processor Architecture

Features
Key features of the Intel Celeron mobile processor includes the following:
• Core micro-architecture.
• 0.065 micron technology (or 65 nanometers).
• Single-core.
• 64 KB L1 cache (32 KB L1 instruction cache; 32 KB L1 write-back data cache).
• On-die 1 MB L2 cache. The L2 cache runs at the core processor speed. It is 8-way set associative
using a 64 byte cache line.
• No L3 cache.
• 533, 667, or 800 MHz system bus which transfers data four times per bus clock (4X) with a
double-clocked address bus (2X). The system bus has a 64-bit data path.
• Intel 64 Technology support providing 64-bit operating systems and application support.
• Execute Disable Bit support.
• No support for Intel Virtualization Technology.
• No Hyper-Threading Technology support.
• Four total execution units:
– 1 integer units [often called Arithmetic Logic Units (ALU)]
– 1 floating point unit
– 1 load unit
– 1 store unit
• 5xx: uses Socket P and a Micro Flip-Chip Pin Grid Array (Micro-FCPGA) package requiring an
mPGA779M socket
• 7xx: uses Socket P and a Micro Flip-Chip Ball Grid Array (Micro-FCBGA) package

PC Architecture (TXW102) September 2008 89


Topic 2 - Processor Architecture

Main memory

533 MHz

64-bit
Micro Flip-Chip Pin
Grid Array or
1 MB L2 cache

256-bit Full speed

32 KB data cache 32 KB inst cache

FPU Load Store Integer Integer


Core (on-die)

Processor Structure of Intel Celeron Mobile


Processor

Intel Celeron Mobile Processor and 965 chipset

Intel Celeron Mobile Processor

PC Architecture (TXW102) September 2008 90


Topic 2 - Processor Architecture

Intel Notebook Processors:


Intel Celeron Processor 5xx Versions (Merom)

Celeron
530, 540, 550, 560, 570, 575, 585
(Merom)
Standard Voltage
1.73, 1.86, 2.00, 2.13, 2.16, 2.26 GHz
Single-core
1MB L2 cache
533 or 667 MHz system bus
No Virtualization Technology
Intel 64 and Execute Disable Bit
Intel Celeron M Processors
June 2007 and after
are used in select Lenovo
ThinkPad notebooks.

• Intel 965 Express Chipset family support

Intel Celeron Mobile Processor


© 2008 Lenovo

Intel Celeron Processor 5xx Versions (Merom)


In June 2007, Intel announced the Intel Celeron mobile processor. The processor supports the
Intel 965 Express Chipset family and other compatible chipsets.

Celeron Processor Voltage Thermal Design Power

530 at 1.73 GHz 0.95-1.3 volts 27-31 watts

540 at 1.86 GHz 0.95-1.3 volts 27-31 watts

550 at 2.00 GHz 0.95-1.3 volts 27-31 watts

560 at 2.13 GHz 0.95-1.3 volts 27-31 watts

570 at 2.26 GHz 0.95-1.3 volts 27-31 watts

575 at 2.00 GHz 0.95-1.3 volts 27-31 watts

585 at 2.16 GHz 0.95-1.3 volts 27-31 watts

PC Architecture (TXW102) September 2008 91


Topic 2 - Processor Architecture

Intel Notebook Processors:


Intel Celeron Processor 7xx Versions (Penryn)

Celeron
723
(Penryn)
Ultra Low Voltage
1.20 GHz
Single-core
1MB L2 cache
800 MHz system bus
No Virtualization Technology
Intel 64 and Execute Disable Bit
Intel Celeron M Processors
August 2008
are used in select Lenovo
ThinkPad notebooks.

• Mobile Intel 4 Series Express Chipset


family support

Intel Celeron Mobile Processor


© 2008 Lenovo

Intel Celeron Processor 7xx Versions (Penryn)


In August 2008, Intel announced the Intel Celeron mobile processor with code-name Penryn.
The processor supports the Mobile Intel 4 Series Express Chipset family and other compatible
chipsets.

Celeron Processor Thermal Design Power

723 at 1.20 GHz 10 watts

PC Architecture (TXW102) September 2008 92


Topic 2 - Processor Architecture

Intel Notebook Processors:


Intel Pentium Dual-Core Processor
• Intel 32-bit/64-bit processors for
notebook systems
• Core micro-architecture
• Key features
- Dual-core
- 1 MB shared L2 cache
- (Some) Intel 64 Technology
- 533 MHz system bus
• Code-name Yonah and Merom
• Positioned for value notebooks
• Used in some Lenovo notebooks

Intel Pentium Dual-Core Processor


is used in select Lenovo notebooks
(IdeaPad Y510 pictured)
© 2008 Lenovo

Intel Pentium Dual-Core Processor


In January 2007, Intel announced the Intel Pentium Dual-Core Processor for notebooks which
had the code-name Yonah and Merom. It is based on the Core micro-architecture so has the
same internal architectural features as the desktop and mobile-based Core 2 Duo Processor.
This processor is positioned for value notebook systems.
The Intel Pentium Dual-Core Processor uses two independent cores on a single die (called a
dual-core processor). It uses a new Core micro-architecture that is common across desktop,
mobile (called Core 2 Duo mobile processor), and server platforms. The processor maintains
compatibility with IA-32 software, yet it supports Intel 64 Technology which is an extension to
the IA-32 instruction set which adds 64-bit extensions to the x86 architecture.

PC Architecture (TXW102) September 2008 93


Topic 2 - Processor Architecture

Features
Key features of the Intel Pentium Dual-Core notebook processor includes the following:
• Core micro-architecture.
• 0.065 micron technology (or 65 nanometers).
• Dual-core processing using two independent cores in one physical package that run at the same
frequency.
• 64 KB L1 cache per core (32 KB L1 instruction cache per core; 32 KB L1 write-back data cache
per core).
• Shared on-die 1 MB L2 cache called Intel Advanced Smart Cache. The L2 cache runs at the core
processor speed. It is 8-way set associative using a 64 byte cache line. Intel Advanced Smart
Cache enables the active execution core to access the full L2 cache when the other execution core
is idle.
• No L3 cache.
• 533 MHz system bus which transfers data four times per bus clock (4X) with a double-clocked
address bus (2X). The system bus has a 64-bit data path.
• Intel Wide Dynamic Execution improves execution speed and efficiency with each core
completing up to four full instructions simultaneously with a 14-stage pipeline. The Core 2 Duo
supports micro-op fusion when an x86 instruction is decoded into micro-ops and two adjacent,
dependent micro-ops combine into a single micro-op and execute in a single cycle. A new feature
is "Macro-op fusion" which means certain x86 instructions may also be paired into a single
instruction, then executed in a single cycle. In certain cases, five instructions can be read from the
instruction queue, then executed as if only four instructions were issued.
• Intel Smart Memory Access optimizes the use of the data bandwidth from the memory subsystem
to accelerate out-of-order execution.
• Intel 64 Technology support providing 64-bit operating systems and application support.

Core1 Core
2

Bus

L2 Cache

Die of Intel Pentium Dual-Core Processor

PC Architecture (TXW102) September 2008 94


Topic 2 - Processor Architecture

• No support for Intel Virtualization Technology.


• Intel Advanced Digital Media Boost which accelerates execution of Steaming SIMD Extension
(SSE) instructions used in multimedia and graphics applications. The 128-bit SSE instructions are
issued at a throughput rate of one per clock cycle.
• Execute Disable Bit support.
• No Hyper-Threading Technology support.
• Five total execution units:
– 2 integer units [often called Arithmetic Logic Units (ALU)]
– 1 floating point unit
– 1 load unit
– 1 store unit
• The Merom versions use Socket P. This is a Micro Flip-Chip Pin Grid Array (Micro-FCPGA)
requiring 479-pin surface mount Zero Insertion Force (ZIF) socket (mPGA479M socket) or
Micro Flip-Chip Ball Grid Array (Micro-FCBGA) for surface mount (479-ball) [the desktop
version uses different packaging].

System Bus

Instruction Fetch Instruction Fetch


and PreDecode and PreDecode
L2 Cache and Control

Instruction Queue Instruction Queue uCode


uCode
ROM

ROM

Decode Decode

Rename/Alloc Rename/Alloc

Reorder Buffer Reorder Buffer


Retirement Unit Retirement Unit

Schedulers Schedulers

FPU ALU ALU Load Store Store Load ALU ALU FPU

L1 D-Cache and D-TLB L1 D-Cache and D-TLB

Block Diagram for Intel Pentium Dual-Core Architecture

PC Architecture (TXW102) September 2008 95


Topic 2 - Processor Architecture

Intel Notebook Processors:


Intel Pentium Dual-Core Processor Versions (Merom)

Pentium Dual-Core
T2310, T2330, T2370, T2390
(Merom)
1.46, 1.60, 1.73, 1.86 GHz
Dual-core
1 MB Shared L2 cache
533 MHz system bus
No Virtualization Technology
Intel 64 and Execute Disable Bit
July 2007 and after

• Intel 965 Express family


chipset support

© 2008 Lenovo

Intel Pentium Dual-Core Processor Versions (Merom)


In July 2007, Intel announced the Merom-based Intel Pentium Dual-Core notebook processor.
All the processors support shared L2 cache, Enhanced Intel SpeedStep Technology, Intel 64
Technology, and Execute Disable Bit (XD).

Execution Execution
Core Core

L1 Cache L1 Cache

L2 Cache Control

L2 Cache

Memory Controller Hub

Processor Structure of
Intel Pentium Dual-Core Processor

PC Architecture (TXW102) September 2008 96


Topic 2 - Processor Architecture

Intel Notebook Processors:


Intel Core 2 Solo Processor
• Intel 32-bit/64-bit processors for
notebook systems
• Core micro-architecture
• Key features
- Single-core
- Ultra Low Voltage
- Intel 64 Technology
- 553 or 800 MHz system bus
• Code-name Merom (Sept 2007)
or Penryn (August 2008)

© 2008 Lenovo

Intel Core 2 Solo Processor


In September 2007, Intel announced the Intel Core 2 Solo mobile processor which had the code-
name Merom. It has the same internal architectural features as the notebook-based version Core
2 Duo processor with code-name Merom except it is single-core with a 533 MHz system bus. It
is only available in Ultra Low Voltage.
In August 2008, Intel announced additional Intel Core 2 Solo mobile processors with the code-
name Penryn. These processors had features common to the Penryn family of processors,
including an 800 MHz system bus, Intel HD Boost, Trusted Execution technology, and 3 MB
L2 cache.

PC Architecture (TXW102) September 2008 97


Topic 2 - Processor Architecture

Intel Notebook Processors:


Intel Core 2 Solo Processor Versions (Merom and Penryn)

Core 2 Solo Core 2 Solo


U2100, U2200 SU3300, SU3500
(Merom) (Penryn)
1.06, 1.20 GHz 1.20, 1.30 GHz
Ultra Low Voltage Ultra Low Voltage
Single-core Single-core
1 MB L2 cache 3 MB L2 cache
533 MHz bus 800 MHz bus
Intel 64 Technology Intel 64 Technology
Socket M Socket P
September 2007 August 2008

• U2xxx: Mobile Intel 945 Express chipset family


(945GM, 945GMS) support
• SU3xxx: Mobile Intel 4 Series Express chipset family support

© 2008 Lenovo

Intel Core 2 Solo Processor Versions (Merom and Penryn)


In September 2007, Intel announced the Socket M-based Intel Core 2 Solo mobile processor.
The U2100 and U2200 processors have a single-core, 1 MB L2 cache, 533 MHz system bus,
Intel 64 Technology, Intel Enhanced SpeedStep Technology, and Virtualization Technology.
These processors are supported by the Mobile Intel 945 Express chipset family particularly the
945GM and 945GMS Express chipsets.
In August 2008, Intel announced the Socket P-based Intel Core 2 Solo mobile processor. The
SU3300 and SU3500 have a single-core, 3 MB L2 cache, 800 MHz system bus, Intel 64
Technology, Intel Enhanced SpeedStep Technology, Virtualization technology, Intel HD Boost,
and Trusted Execution Technology.

PC Architecture (TXW102) September 2008 98


Topic 2 - Processor Architecture

Intel Notebook Processors:


Intel Core 2 Duo Processor
• Intel 32-bit/64-bit processors for mobile
systems
• Core micro-architecture
• Key features
- Dual-core
- Shared L2 cache
- Intel 64 Technology
- (Some) Intel Dynamic Acceleration
- 553, 667, 800, or 1066 MHz system bus
• Code-name Merom (July 2006) and Penryn
(January 2008)
• Used in some Lenovo notebooks

Intel Core 2 Duo Mobile Processor is


used in select Lenovo ThinkPad
notebooks
© 2008 Lenovo

Intel Core 2 Duo Processor


In July 2006, Intel announced the Intel Core 2 Duo mobile processor which had the code-name
Merom and 65nm process technology. In January 2008, a 45nm version with code-name Penryn
was announced. It has the same internal architectural features as the desktop-based version (with
the same name of Core 2 Duo Processor) except for various power-related features, physical
packaging, and other items.
The Core 2 Duo mobile processor uses two independent cores on a single die (called a dual-core
processor). It uses the Core micro-architecture that is common across mobile, desktop (called
Core 2 Duo desktop processor), and server platforms. The Core 2 Duo mobile processor
maintains compatibility with IA-32 software, yet it supports Intel 64 Technology which is an
extension to the IA-32 instruction set which adds 64-bit extensions to the x86 architecture.
For computer users who run multiple demanding applications simultaneously, the Intel Core 2
Duo mobile processor is Intel’s preferred mobile processor, offering exceptional performance
and responsiveness so that users can get the most productivity and enjoyment from their
applications, because it is powered by multiple cores in one processor.

PC Architecture (TXW102) September 2008 99


Topic 2 - Processor Architecture

Features
Key features of the Intel Core 2 Duo mobile processor includes the following:
• Core micro-architecture.
• 65 nanometers (Merom) or 45 nanometers (Penryn) process technology
• Dual-core processing using two independent cores in one physical package that run at the same
frequency.
• 64 KB L1 cache per core (32 KB L1 instruction cache per core; 32 KB L1 write-back data cache
per core).
• Shared on-die 2 MB, 3 MB, 4 MB, or 6 MB L2 cache called Intel Advanced Smart Cache. The
L2 cache runs at the core processor speed. It is 8-way set associative using a 64 byte cache line.
Intel Advanced Smart Cache enables the active execution core to access the full L2 cache when
the other execution core is idle.
• No L3 cache.
• 533, 667, 800, or 1066 MHz system bus which transfers data four times per bus clock (4X) with a
double-clocked address bus (2X). The system bus has a 64-bit data path.
• Intel Wide Dynamic Execution improves execution speed and efficiency with each core
completing up to four full instructions simultaneously with a 14-stage pipeline. The Core 2 Duo
supports micro-op fusion when an x86 instruction is decoded into micro-ops and two adjacent,
dependent micro-ops combine into a single micro-op and execute in a single cycle. A new feature
is "Macro-op fusion“ which means certain x86 instructions may also be paired into a single
instruction, then executed in a single cycle. In certain cases, five instructions can be read from the
instruction queue, then execute as if only four instructions were issued.
• Intel Smart Memory Access optimizes the use of the data bandwidth from the memory subsystem
to accelerate out-of-order execution.
• Intel 64 Technology support providing 64-bit operating systems and application support.

Core1 Core
2

Bus

L2 Cache

Die of Intel Core 2 Duo Mobile Processor

PC Architecture (TXW102) September 2008 100


Topic 2 - Processor Architecture

• Some support Intel Virtualization Technology to allow hardware-based virtualization (Intel


Virtualization Technology requires a system with a processor, chipset, BIOS, virtual machine
monitor (VMM) and applications enabled for Virtualization Technology).
• Intel Advanced Digital Media Boost which accelerates execution of Steaming SIMD Extension
(SSE) instructions used in multimedia and graphics applications. The 128-bit SSE instructions are
issued at a throughput rate of one per clock cycle.
• Execute Disable Bit support.
• No Hyper-Threading Technology support.
• Five total execution units:
– 2 integer units [often called Arithmetic Logic Units (ALU)]
– 1 floating point unit
– 1 load unit
– 1 store unit
• Select versions use a Micro Flip-Chip Pin Grid Array (Micro-FCPGA) requiring 479-pin surface
mount Zero Insertion Force (ZIF) socket (mPGA479M socket) or a Micro Flip-Chip Ball Grid
Array (Micro-FCBGA) for surface mount (479-ball) [the desktop version uses different
packaging].

System Bus

Instruction Fetch Instruction Fetch


and PreDecode and PreDecode
L2 Cache and Control

Instruction Queue Instruction Queue uCode


uCode
ROM

ROM

Decode Decode

Rename/Alloc Rename/Alloc

Reorder Buffer Reorder Buffer


Retirement Unit Retirement Unit

Schedulers Schedulers

FPU ALU ALU Load Store Store Load ALU ALU FPU

L1 D-Cache and D-TLB L1 D-Cache and D-TLB

Block Diagram for Intel Core 2 Duo Architecture

PC Architecture (TXW102) September 2008 101


Topic 2 - Processor Architecture

Intel Core 2 Duo mobile processors include these features that are not found in the desktop
processor:
• Intel Dynamic Power Coordination – Coordinates Enhanced Intel SpeedStep Technology and idle
power-management state (C-states) transitions independently per core to help save power.
• Intel Dynamic Bus Parking – Enables platform power savings and improved battery life by
allowing the chipset to power down with the processor in low-frequency mode.
• Enhanced Intel Deeper Sleep with Dynamic Cache Sizing – Saves power by flushing cache data
to system memory during periods of inactivity to lower CPU voltage.
• (Some) Intel Dynamic Acceleration which allows one core to deliver extra performance when
other core is idle.
Socket P versions have this feature:
• Dynamic Front Side Bus Frequency Switching – Changes the bus clock frequency, allowing a
reduction in core voltage enabling a lower power active state called super LFM. This allows the
bus to run at full 800 MHz or at 50% of its frequency of 400 MHz.
Penryn-based processors include these features:
• Intel HD Boost which is Streaming SIMD Extensions 4 (SSE4) and faster Super Shuffle Engine.
• Intel Deep Power Down Technology which is a low-power state that allows both cores and L2
cache to be powered down when the processor is idle.

PC Architecture (TXW102) September 2008 102


Topic 2 - Processor Architecture

(Graphics) Memory Controller Hub


chipset
533, 667, 800,
or 1066 MHz
64-bit

Shared L2 cache (Intel Advanced


Smart Cache)
Full
256-bit 256-bit
speed
32 KB data cache 32 KB inst cache 32 KB data cache 32 KB inst cache
One of two
cores FPU Load Store Integer Integer FPU Load Store Integer Integer
(on-die)

Processor Structure of Intel Core 2 Duo Processor

Thermal Control
Thermal Control
Arch State Arch State

Execution Execution Execution Execution


Core Core Resources Resources

L1 Caches L1 Caches

L1 Cache L1 Cache APIC L2 Shared Cache APIC

L2 Cache Control Power Management Core Coordination


Coordination Logic Logic
L2 Cache
Bus Interface

Memory Controller Hub


Memory Controller Hub
Processor Structure of
Intel Core 2 Duo Processor Processor Structure of
(simplified) Intel Core 2 Duo Processor
(detailed)

Die of Intel Core 2 Duo Processor

PC Architecture (TXW102) September 2008 103


Topic 2 - Processor Architecture

The following table shows some key differences between the mobile-based Core Duo Processor
and the Core 2 Duo processor.

Core Duo Processor Core 2 Duo Processor (Merom)


Announced January 2006 Announced July 2006
NetBurst micro-architecture Core micro-architecture
Dual-core Dual-core
Shared 2MB L2 cache Shared 2 MB or 4MB L2 cache
No Intel 64 Technology Intel 64 Technology
31-stage pipeline 14-stage pipeline

Issue/retire 3 instructions per clock Issue/retire 4 instructions per clock

Micro-ops and micro-op fusion;


Micro-ops and micro-op fusion
macro-op fusion
SSE, SSE2, SSE3 (4 floating point SSE, 128-bit SSE2/3, (8 floating point operations
operations per cycle) per cycle)
One SIMD instruction executed in 2 cycles One SIMD instruction executed in 1 cycle

No Intel Dynamic Acceleration Some support Intel Dynamic Acceleration

Most support Virtualization Technology Most support Virtualization Technology

533 or 667 MHz system bus 533, 667, or 800 MHz system bus

PC Architecture (TXW102) September 2008 104


Topic 2 - Processor Architecture

Intel Core 2 Duo Processor Wafer

Intel Core 2 Duo Processor

Intel Core 2 Duo Processor is used


in select ThinkPad notebooks.

PC Architecture (TXW102) September 2008 105


Topic 2 - Processor Architecture

Intel Notebook Processors:


Intel Core 2 Duo Processor Versions [Socket M] (Merom)

Core 2 Duo
Core 2 Duo Core 2 Duo Core 2 Duo
T5200, T5300,
U7500, U7600, U7700 L7200, L7400 T7200, T7400, T7600
T5500, T5600
(Merom) (Merom) (Merom)
(Merom)
1.06, 1.20, 1.33 GHz 1.33, 1.5 GHz 1.60, 1.66, 1.73, 1.83 GHz 2.00, 2.16, 2.33 GHz
Ultra Low voltage 10W Low voltage 17W Standard voltage 35W Standard voltage 35W
Dual-core Dual-core Dual-core Dual-core
2 MB Shared L2 cache 4 MB Shared L2 cache 2 MB Shared L2 cache 4 MB Shared L2 cache
533 MHz bus 667 MHz bus 533 or 667 MHz bus 667 MHz bus
Virtualization Virtualization Virtualization
(Only T5600) VT
Technology Technology Technology
Intel 64 Technology Intel 64 Technology Intel 64 Technology Intel 64 Technology
April 2007 and after January 2007 July 2006 and after July 2006

• Mobile Intel 945 Express family (945GM, 945GMS, 945PM)


chipset support
• Socket M support

© 2008 Lenovo

Intel Core 2 Duo Mobile Processor Versions (Merom)


In July 2006, Intel announced the Socket M-based Intel Core 2 Duo mobile processor. All the
processors support shared L2 cache, 533 or 667 MHz system bus, Intel 64 Technology, Execute
Disable Bit (XD), and Intel Enhanced SpeedStep Technology.
These processors are supported by the Mobile Intel 945 Express chipset family such as the
945GM, 945GMS, and 945PM Express chipsets.
For the numbering of the processor, the first letter represents the following:
• U = <14 watts (Ultra Low Voltage)
• L = 15 to 24 watts (Low Voltage)
• T = 25 to 49 watts
• E = >50 watts

Intel Core 2 Duo Processor is used in


select Lenovo ThinkPad notebooks

PC Architecture (TXW102) September 2008 106


Topic 2 - Processor Architecture

Intel Notebook Processors:


Intel Core 2 Duo Processor Versions [Socket P] (Merom)

Core 2 Duo Core 2 Duo Core 2 Duo


U7500, U7600, U7700 L7300, L7500, L7700 T7100, T7250, T7300, T7500,
(Merom) (Merom) T7700, T7800 (Merom)
1.06, 1.20, 1.33 GHz 1.4, 1.6, 1.8 GHz 1.8, 2.0, 2.2, 2.4, 2.6 GHz
Ultra Low voltage 10W Low voltage 17W Standard voltage 35W
Dual-core Dual-core Dual-core
2 MB Shared L2 cache 4 MB Shared L2 cache 2 MB or 4 MB Shared L2 cache
533 MHz bus 800 MHz bus 800 MHz bus
Virtualization Technology Virtualization Technology Virtualization Technology
Intel 64 Technology Intel 64 Technology Intel 64 Technology
Intel Dynamic Acceleration Intel Dynamic Acceleration Intel Dynamic Acceleration
June 2007 and after May 2007 and after May 2007 and after

• Mobile Intel 965 Express family (GM965, PM965)


chipset support
• Socket P support

© 2008 Lenovo

Intel Core 2 Duo Processor Versions (Merom)


In May 2007, Intel announced the Socket P-based Intel Core 2 Duo mobile processor. All the
processors support shared L2 cache, 800 MHz system bus, Intel 64 Technology, Execute
Disable Bit (XD), Enhanced Intel SpeedStep Technology, Intel Dynamic Acceleration, and
Dynamic Front Side Bus Frequency Switching. These processors all have 4 MB shared L2
cache except the T7100 and T7250 which have 2 MB shared L2 cache.
These processors are supported by the Mobile Intel 965 Express chipset family such as the
GM965 and PM965 Express chipsets.
For the numbering of the processor, the first letter represents the following:
• U = <14 watts (Ultra Low Voltage)
• L = 15 to 24 watts (Low Voltage)
• T = 25 to 49 watts
• E = >50 watts

Intel Core 2 Duo Processor is used in


select Lenovo ThinkPad notebooks

PC Architecture (TXW102) September 2008 107


Topic 2 - Processor Architecture

Intel Notebook Processors:


Intel Core 2 Duo Processor Versions [Socket P] (Penryn)

Core 2 Duo Core 2 Duo Core 2 Duo Core 2 Duo


T8100, T8300 T9300, T9500 P8400, P8600, P9500 T9400, T9600
(Penryn) (Penryn) (Penryn) (Penryn)
2.1, 2.4 GHz 2.5, 2.6 GHz 2.26, 2.4, 2.53 GHz 2.53, 2.8 GHz
Standard Voltage 35W Standard Voltage 35W Standard Voltage 25W Standard Voltage 35W
Dual-core Dual-core Dual-core Dual-core
3 MB L2 cache 6 MB L2 cache 3 MB or 6 MB 6 MB L2 cache
800 MHz bus 800 MHz bus 1066 MHz bus 1066 MHz bus
HD Boost HD Boost HD Boost HD Boost
None None Trusted Execution Tech Trusted Execution Tech
Intel Dynamic Intel Dynamic Intel Dynamic Intel Dynamic
Acceleration Acceleration Acceleration Acceleration
January 2008 January 2008 July 2008 July 2008

• Mobile Intel 965 Express family (GM965, PM965) and Mobile Intel 4 Series
chipset support
• Socket P support
© 2008 Lenovo

Intel Core 2 Duo Processor Versions (Penryn)


In January 2008, Intel announced the Intel Core 2 Duo mobile processor with code-name
Penryn that uses Socket P. All the processors support shared L2 cache, 800 MHz system bus,
Intel 64 Technology, Execute Disable Bit (XD), Enhanced Intel SpeedStep Technology, Intel
Dynamic Acceleration, Dynamic Front Side Bus Frequency Switching, and HD Boost.
In July 2008, new processors were announced with 25W support (P8400, P8600, and P9500).
All new processors had support for a 1066 MHz system bus and Intel Trusted Execution
Technology support.
These processors are supported by the Mobile Intel 965 Express chipset family such as the
GM965 and PM965 Express chipsets and the Mobile Intel 4 Series chipset.

Intel Core 2 Duo Processor is used in


select Lenovo ThinkPad notebooks

PC Architecture (TXW102) September 2008 108


Topic 2 - Processor Architecture

Intel Notebook Processors:


Intel Core 2 Duo Processor Versions [SFF] (Penryn)

Core 2 Duo Core 2 Duo Core 2 Duo


SU9300, SU9400 SL9300, SL9400 SP9300, SP9400
(Penryn) (Penryn) (Penryn)
1.20, 1.40 GHz 1.60, 1.86 GHz 2.26, 2.40 GHz
Ultra Low Voltage 10W Low Voltage 17W Standard Voltage 25W
Dual-core Dual-core Dual-core
3 MB L2 cache 6 MB L2 cache 6 MB
800 MHz bus 1066 MHz bus 1066 MHz bus
HD Boost HD Boost HD Boost
Trusted Execution Tech Trusted Execution Tech Trusted Execution Tech
Intel Dynamic Intel Dynamic Intel Dynamic
Acceleration Acceleration Acceleration
August 2008 August 2008 August 2008

• Small Form Factor (SFF) notebook processors


• Mobile Intel 4 Series chipset support
• Socket P support
© 2008 Lenovo

Intel Core 2 Duo Processor Versions [SFF] (Penryn)


In August 2008, Intel announced the Intel Core 2 Duo mobile processor with code-name Penryn
for Small Form Factor (SFF) notebooks. All the processors support Socket P, 800 or 1066 MHz
system bus, Intel 64 Technology, Enhanced Intel SpeedStep Technology, Intel Dynamic
Acceleration, Trusted Execution Technology, and HD Boost.
These processors are supported by the Mobile Intel 4 Series chipset.

Intel Core 2 Duo Processor (SFF) is used in


select Lenovo ThinkPad notebooks such as the
ThinkPad X301

PC Architecture (TXW102) September 2008 109


Topic 2 - Processor Architecture

Intel Notebook Processors:


Intel Core 2 Quad Processor

• Intel 32-bit/64-bit processors for notebook systems


• Core micro-architecture
• Key features
- Quad-core
- 12 MB L2 cache
- 1066 MHz system bus
• Code-name Penryn (August 2008)

© 2008 Lenovo

Intel Core 2 Quad Processor


The Intel Core 2 Quad Processor for notebooks is a quad-core processor.

PC Architecture (TXW102) September 2008 110


Topic 2 - Processor Architecture

Intel Notebook Processors:


Intel Core 2 Quad Processor Versions

Core 2 Quad
Q9100
2.26 GHz
Standard Voltage
Quad-core
12 MB L2 cache
1066 MHz bus
HD Boost
August 2008

• Mobile Intel 4 Series chipset support


• Socket P support

© 2008 Lenovo

Intel Core 2 Quad Processor Versions


In August 2008, Intel announced the Socket P-based Intel Core 2 Quad mobile processor. The
processor has quad-core, 12 MB shared L2 cache, 1066 MHz system bus, Intel 64 Technology,
Execute Disable bit, Intel Enhanced SpeedStep Technology, Virtualization Technology, Intel
HD Boost, and Intel Dynamic Acceleration.

PC Architecture (TXW102) September 2008 111


Topic 2 - Processor Architecture

Intel Notebook Processors:


Intel Core 2 Extreme Processor
• Intel 32-bit/64-bit processors for
notebook systems
• Intel’s highest performance notebook
processor
• Core micro-architecture
• Key features
- Dual-core or quad-core
- 4 MB, 6 MB, or 12 MB shared
L2 cache
- Intel 64 Technology
- 800 or 1066 MHz system bus
• Code-name Merom (July 2007) and
Penryn (January 2008)

© 2008 Lenovo

Intel Core 2 Extreme Processor


In July 2007, Intel announced the Intel Core 2 Extreme mobile processor which had the code-
name Merom. Later versions had the code-name Penryn. It has the same internal architectural
features as the notebook-based Core 2 Duo processor with code name Merom.

Intel Core 2 Extreme is used in select


Lenovo ThinkPad W700 notebooks

PC Architecture (TXW102) September 2008 112


Topic 2 - Processor Architecture

Intel Notebook Processors:


Intel Core 2 Extreme Processor Versions

Core 2 Extreme Core 2 Extreme Core 2 Extreme


X7800, X7900 X9000, X9100 QX9300
(Merom) (Penryn) (Penryn)
2.60, 2.80 GHz 2.80, 3.06 GHz 2.53 GHz
Standard Voltage Standard Voltage Standard Voltage
Dual-core Dual-core Quad-core
4 MB L2 cache 6 MB L2 cache 12 MB L2 cache
800 MHz bus 800 MHz bus 1066 MHz bus
No HD Boost HD Boost
Intel 64 Technology Intel 64 Technology Intel 64 Technology
July 2007 and after January 2008 and after August 2008

• Mobile Intel 945 Express family (945GM, 945GMS, 945PM)


and Mobile Intel 4 Series chipset support
• Socket P support

© 2008 Lenovo

Intel Core 2 Extreme Processor Versions


In July 2007, Intel first announced the Socket P-based Intel Core 2 Extreme mobile processor.
The processors have dual-core or quad-core, 4 MB or 6 MB or 12 MB shared L2 cache, 800 or
1066 MHz system bus, Intel 64 Technology, Execute Disable Bit (XD), Intel Enhanced
SpeedStep Technology, and Virtualization Technology. Some versions have Intel HD Boost
which is 47 new multimedia instructions named Streaming SIMD Extensions 4 (SSE4). Some
versions have Intel Dynamic Acceleration and Trusted Execution Technology (TXT).
These processors are supported by the Mobile Intel 945 Express chipset family and the Mobile
Intel 4 Series Express chipset family.

PC Architecture (TXW102) September 2008 113


Topic 2 - Processor Architecture

Summary:
Processor Architecture

• Processors may include Hyper-Threading Technology, Intel 64 Technology,


HD Boost, multiple cores, and virtualization technology.
• Processors have different physical packaging.
• The current Intel desktop processors include the Celeron, Celeron
Dual-Core, Pentium Dual-Core, Core 2 Duo, Core 2 Quad, and
Core 2 Extreme processors.
• The current Intel notebook processors include the Celeron, Pentium
Dual-Core, Core 2 Solo, Core 2 Duo, Core 2 Quad, and Core 2
Extreme processors.

ThinkPad and ThinkCentre systems


Intel Core 2 Duo Processor use a range of Intel processors.

© 2008 Lenovo

Summary: Processor Architecture

Semiconductor Industry Association’s (SIA) road map projects that chips will continue increasing in speed,
density, and power dissipation along their historic paths, aided by shrinking features.

1997 1999 2001 2003 2012


Feature size (micron) 0.25 0.18 0.15 0.13 0.05
Wafer size (mm) 200 300 300 300 450
Minimum operating voltage 1.8-2.5 1.5-1.8 1.2-1.5 1.2-1.5 0.5-0.6
Maximum power, desktop 70 90 110 130 175
Maximum power, mobile 1.2 1.4 1.7 2.0 3.2
On-chip frequency (MHz) 750 1,250 1,500 2,100 10,000
DRAM capacity 256 MB 1 GB 1 GB 4 GB 256 GB

Source: Semiconductor Industry Association

PC Architecture (TXW102) September 2008 114


Topic 2 - Processor Architecture

Review Quiz

Objective 1

1. What is the purpose of a math coprocessor?


a. Accelerates the calculation speed of numeric floating point operations
b. Reduces the electrical power required by the processor
c. Conserves battery life by reducing power when the processor is inactive
d. Provides systems management capabilities to the processor

2. Why does a processor have a slower system bus speed than internal core speed?
a. The slower system bus speed is utilized by the math coprocessor.
b. The L1 cache gets a higher hit ratio with the slower system bus speed.
c. The execution units vary between the two speeds to conserve power.
d. Engineering design costs are reduced with a slower external system bus speed.

3. What best describes MMX, SSE, SSE2, SSE3, SSE4?


a. Features of a processor to allow Linux support
b. Instructions in Intel processors that multimedia-related applications can write to for
performance improvement
c. Types of L3 cache on the die of the processor
d. The packaging variations of the Itanium processor

4. What feature in Intel processors helps protect memory data areas from malicious software
execution?
a. Intel 64 Technology
b. Execute Disable Bit
c. Renaming registers
d. Dual core

5. What are instruction set extensions to IA-32 to allow Intel processors to run 64-bit operating
systems and applications?
a. XA-32
b. Intel 64 Technology
c. Execute Disable Bit
d. Land Grid Array

6. What is a dual-core processor?


a. Two execution units share a single math coprocessor
b. Two system buses to support dual-channel memory
c. Two independent processor cores on a single physical die
d. Two logical processors that support Hyper-Threading Technology

PC Architecture (TXW102) September 2008 115


Topic 2 - Processor Architecture

7. What technology allows a single system to run multiple operating systems in independent
partitions?
a. Execute Disable Bit
b. Intel 64 Technology
c. Dual-core processing
d. Virtualization

Objective 2

8. Desktop processors may implement all of the following packaging except which one?
a. Pin Grid Array (PGA)
b. Voltage Regulator Module (VRM)
c. Flip-Chip PGA
d. Land Grid Array

9. Notebook systems will typically use what type of processor packaging?


a. Pin Grid Array
b. Single Edge Contact
c. Mobile Module and Pin Grid Array
d. Micro BGA and Micro PGA

Objective 3

10. What is the Intel processor for value desktops?


a. Core 2 Quad processor
b. Celeron processor
c. Core 2 Extreme processor
d. Core 2 Duo processor

11. Which Intel processor with four cores supports Virtualization Technology, Intel 64 Technology,
and 8 MB L2 cache?
a. Core Processor
b. Core 2 Duo Processor
c. Core 2 Quad Processor
d. Core 4 Processor

PC Architecture (TXW102) September 2008 116


Topic 2 - Processor Architecture

Objective 4

12. How many cores are included in the Intel Core 2 Solo processor?
a. None
b. One
c. Two
d. Four

13. What is an important difference between earlier versions and later versions of the Intel Core 2
Duo Mobile Processor?
a. Earlier versions use Socket M while newer versions use Socket P for 800 MHz bus support
b. Later versions support Intel 64 Technology
c. Earlier versions support Intel Centrino Pro processor technology
d. There is no difference

PC Architecture (TXW102) September 2008 117


Topic 2 - Processor Architecture

Answer Key
1. A
2. D
3. B
4. B
5. B
6. C
7. D
8. B
9. D
10. B
11. C
12. B
13. A

PC Architecture (TXW102) September 2008 118


Topic 3 - Memory Architecture

PC Architecture (TXW102)
Topic 3:
Memory Architecture

© 2008 Lenovo

PC Architecture (TXW102) September 2008 1


Topic 3 - Memory Architecture

Objectives:
Memory Architecture

Upon completion of this section, you will be able to:

1. Differentiate between L1 cache, L2 cache, and main memory


2. Define various memory terminology and packaging
3. Describe the types of DRAM memory used for main memory
4. Recognize how memory capacity increases system performance
5. Define the type of error-correcting memory used in notebook and desktop
systems
6. Recognize the features and performance impact of memory cache

© 2008 Lenovo

PC Architecture (TXW102) September 2008 2


Topic 3 - Memory Architecture

L1 Cache, L2 Cache, and Memory

Processor +
L1 cache
L2 cache
PCI Express Memory and optional
x16 slot graphics controller
MCH or
PCI Express slots GMCH
host bridge Memory
Direct
Media
Interface PCIe controller
I/O PCI controller
PCIe Controller SATA controller
Hub IDE controller
(ICH)
USB controller
4 SATA disks
Super I/O USB 2.0
Firmware AC '97 codecs
hub or
Low Pin
High Definition Audio
Count interface

• L1 and L2 cache (memory cache) is within the processor.


• Memory (main memory) holds active data and instructions.

© 2008 Lenovo

L1 and L2 Cache
Processors (such as the Intel Core 2 Duo Processor) have caches to hold recently used data and
instructions. The caches are within the processor and are at different levels. The processor will first
look at L1 (level 1); if the information is not found, the processor will look in L2 (level 2), and so
forth. Some processors also have a third level or L3 cache. L1 and L2 cache is often referred to as
memory cache.
Dual-core processors have two cores, while quad-core processors have four cores. Each core has its
own dedicated L1 cache (so the L1 cache is not shared). For the L2 cache of dual-core or quad-core
processors, earlier implementations had each core use dedicated L2 cache while current
implementations share a single L2 cache among the cores.

Reasons for Cache


Research has shown that when a system uses data once, the system is likely to use that data again.
The faster the access to this data occurs, the faster the overall machine will operate. Cache is a
memory buffer that acts as temporary storage for instructions and data obtained from slower main
memory. For a processor, there are multiple levels of cache such as L1, L2, and sometimes L3
cache. L1, L2, and L3 cache use static RAM (SRAM), which is significantly faster than the
dynamic RAM (DRAM) used for system memory (typically five to ten times faster). However,
SRAM is more expensive, requires more power, and thus is not used for main memory.

PC Architecture (TXW102) September 2008 3


Topic 3 - Memory Architecture

Benefits of Cache
Caches reduce the number of clock cycles required for a memory access, in that caches are
implemented with fast SRAMs. Whenever the processor performs external memory read accesses,
the cache controller always pre-fetches extra bytes and loads them into the L1, L2, and L3 cache.
When the processor needs the next piece of data, it is likely that the data is already in these caches.
If so, processor performance is enhanced; if not, the penalty is minimal.

Basic Facts about Memory


• Memory is sometimes referred to simply as “memory”, but it can also be called main memory,
system memory, system RAM (random access memory), and DRAM (dynamic random access
memory.)
• Memory holds data and instructions that the processor is using, enabling it to access data and
instructions quickly (compared to getting them from disk or CD-ROM).
• The contents of all memory and processor contents cache are lost when the computer is powered
off because it is stored electrically. When electricity to the computer is shutdown, the electrons
lose their charge.
• Memory is located on the local bus (sometimes called host processor bus, memory bus, or system
bus).
• Memory is controlled by a memory controller, which is a chip that handles all movement in and
out of memory.
• Memory performance is often measured by latency (the time from the start of an access to the
beginning of a burst transfer) and bandwidth (the peak or average data rate).

The Cost Factor


A basic tradeoff that all system designers must face is that as the access time goes down,
manufacturing cost goes up. Processor architectures allow for a certain number of clock cycles in
order to read and/or write information to system memory. If for some reason the operation is not
completed in the given number of clocks, the processor must wait by inserting additional states into
the basic operation. These are called wait states and are integer multiples of clock cycles. The
challenge is that as each new generation of processors is clocked faster, it becomes more expensive
to incorporate memory devices that have access times allowing zero wait designs.
For example, DRAM has a typical access time of about 60 ns. Access time is the amount of time
that passes between the instant the processor issues an instruction to the memory to read data from
an address and the moment it receives it. A 60 ns DRAM is not fast enough to permit a zero wait
state design with a newer processor. Static RAM, or SRAM, has an access time of less than 10 ns.
A 10 ns SRAM design would allow for zero waits at current processor speeds but would be
prohibitively expensive to implement as main memory.

PC Architecture (TXW102) September 2008 4


Topic 3 - Memory Architecture

Memory Terminology and Packaging

• Read-only memory (ROM)


• Random access memory (RAM)
- Reads and writes
• Virtual memory
• Memory packaging is independent
of its architecture
DIMM used in
- Packaging types: DIP, SIP, ZIP, Lenovo desktops
SIMM, DIMM, SODIMM, IC, SOJ,
TSOP, FBGA

SODIMM used in
Lenovo notebooks

© 2008 Lenovo

Memory Terminology and Packaging


Below are some types of memory and how they are used. It is important to note that a type of
memory is different than memory packaging. Memory packaging is the physical container for a
type of memory.
Memory is either volatile or non-volatile. Volatile memory requires a constant electric charge to
hold information and includes RAM, DRAM, and SRAM. Non-volatile memory holds the contents
of memory without a constant electric charge and includes ROM and flash memory.
ROM (read-only memory) is implemented on adapter cards and other components. ROM is used to
store boot code, power-on diagnostics, and device-specific code on adapters and is known for its
slow performance. ROM cannot be used for writes. ROM can be randomly accessed, so technically
speaking, ROM is a type of RAM.
RAM (random access memory) is the main memory in a PC. The term random comes from the
retrieval of data from any individual location or address within the memory (nonlinear) from the
processor. RAM can be used for reads and for writes. DRAM (dynamic RAM) and SRAM (static
RAM) are two common types of RAM.
Virtual memory uses hard disk as memory. This function is partly in hardware (all current
processors) and partly in the operating system. It allows addresses to take values up to the
maximum allowed by the logic (e.g., 4 GB) even though the machine does not have this amount of
memory installed. The memory manager pages programs and data in and out of memory from a
disk file called the Swapper or Swap File. All major operating systems provide virtual memory.

PC Architecture (TXW102) September 2008 5


Topic 3 - Memory Architecture

CMOS (complementary metal oxide semiconductor) is often used to store the configuration
parameters of a system or an adapter. It requires little power but loses its contents if power is
removed. A battery is often used with CMOS so that the contents will remain once power is
removed from the system.

Memory Packaging
The following computer chips are usually implemented in various subsystems such as adapter cards
or systemboard components. Packaging for main memory is discussed later. These kinds of chips
would not be used for main memory today.
• Dual in-line package (DIP) – DIPs are traditional
"buglike" computer chip with 8, 14, 24, 40,
or more metal legs evenly divided between
right and left sides. DIPs are installed in holes
extending into the surface of a printed circuit
board. DIPs can be installed in sockets or
soldered in place.
DIP

• Single in-line package (SIP) – SIPs are a type of housing for electronic components in which
single-package arrays of computer chip logic are arranged in such a way that all connecting legs
are in a straight line.

SIP

• Zig-zag in-line package (ZIP) – ZIPs are similar to SIPs; every other pin connection is slightly
offset in a zigzag pattern.
• Single in-line memory module (SIMM) and dual in-line memory module (DIMM) – SIMMs and
DIMMs are individual logic devices installed on small circuit boards. The physical arrangement
facilitates easy installation and replacement.
• Integrated circuit (IC) – ICs are memory in a PCMCIA-similar card that conforms to JEDEC
standards.
• Small outline J-lead (SOJ) – An SOJ is a common form of surface-mount DRAM packaging that
is a rectangular package with J-shaped leads on the two longest sides.

SOJ

PC Architecture (TXW102) September 2008 6


Topic 3 - Memory Architecture

• Thin small outline package (TSOP) – A TSOP or TSOP2 is a DRAM packaging with gull-shaped
leads on both sides. A TSOP package is one-third the thickness of an SOJ package. TSOPs are
mounted directly on the surface of a printed circuit board and are typically used in SODIMMs
and PC Cards.

TSOP

• Fine-pitch ball grid array (FBGA) – An FBGA is a chip scale package for flash memories. It
physically encapsulates the Flash die.

FBGA

Importance of Memory
Any subsystem that needs to hold data needs memory. This table shows types of memory and their
architectures.

Subsystem Type Architecture


L2 cache SRAM Asynchronous or synchronous
Main memory DRAM DDR2, DDR3
Graphics DRAM DDR2, DDR3, proprietary
Disk DRAM/Flash DDR2, DDR3, NAND
BIOS Flash NOR

PC Architecture (TXW102) September 2008 7


Topic 3 - Memory Architecture

Memory Terminology and Packaging:


Flash Memory
• Traps the charge so contents not lost when power is removed
• No moving parts so handles vibration and dropping
• Very small and thin, but very expensive (compared to disk-based
storage)
• Used in subsystems (BIOS, disks, adapters, etc.)
• Also used in digital music players, flash memory keys, cell
phones, and other electronic devices
• Newest uses
- Solid State Drives (SSD)
for disks
- Intel Turbo Memory for
ReadyBoost and ReadyDrive
• Technology: NOR or NAND Lenovo USB 2.0 Memory Key
uses flash memory

© 2008 Lenovo

Flash Memory
Flash memory is a form of non-volatile memory that allows electronic reads and writes. Non-
volatile means it does not lose its contents when power is turned off. Flash memory uses no moving
parts, so it is very useful for data storage; however, it is very expensive compared to disk-based
storage. With no moving parts, flash memory can handle vibration and dropping with no damage.
Flash memory is also very small and thin compared to disk-based storage.
Flash memory often is used to store BIOS on systemboards and for firmware on adapters. It is also
used for removable or integrated storage in digital cameras and handheld devices. Flash memory
gets its name because the microchip is organized so that a section of memory cells are erased in a
single action or "flash."

Flash Memory Disadvantages


Flash memory has three disadvantages. First, it is expensive relative to DRAM memory. Second, it
tends to degrade over time as numerous writes are executed. Third, it is slow for writes compared
to both DRAM and even hard disks.

PC Architecture (TXW102) September 2008 8


Topic 3 - Memory Architecture

Flash Memory Technology


Flash memory traps the electrical charge, so memory retains its contents when the power is turned
off. Data can be erased only in blocks (such as 64 KB blocks) when it needs to be modified. It is
limited to between 100,000 and one million writes (depending on the design of the cell and certain
manufacturing variables). Flash memory has a limited life for writes and an unlimited life for reads.
Reads are fast (44 to 200 ns), but writes to flash memory are slower than writing to a hard disk.
Typical flash memory uses a single bit per cell. Each cell (or transistor) is characterized by a
specific threshold voltage. An electrical charge is stored on the floating gate of each cell. Within
each cell, two possible voltage levels exist. These two levels are controlled by the amount of charge
that is programmed or stored on the floating gate; if the amount of charge on the floating gate is
above a certain reference level, the cell is considered to be in a different voltage level.
EEPROM (electrically erasable programmable read-only memory) is electronically erasable and is
a type of flash memory. EEPROM is erased and rewritten at the byte level (not block level) so it is
slower than flash memory.

PC Architecture (TXW102) September 2008 9


Topic 3 - Memory Architecture

NOR and NAND Flash Memory


NOR and NAND are the two main types of flash technology. The names, NOR and NAND, refer
to the type of logic gate, a fundamental building block of a digital circuit, used in each storage cell.
Basically, NAND stands for "Not AND", while NOR stands for "Not OR". The two chips work
differently. Because NOR runs software quickly, it is ideal for mobile phones. But NAND beats
NOR in storage capacity, which is why NAND is perfect for devices such as digital music players.
Other kinds of chips do not match up well against NAND either. DRAM (dynamic RAM), for
example, could meet NAND's storage ability, but DRAM does not retain data after the power is cut
off.

NOR NAND
Capacity 1 MB-32 MB 16 MB-1 GB
XIP capabilities Yes None
(code execution)

Performance Very slow erase (5 sec) Fast erase (3 msec)


Slow write Fast write
Fast read Fast read
Reliability Standard: Low:
Bit-flipping issues reported Requires 1-4 bit EDC/ECC due to bit-flipping
Less than 10% the life span of NAND issue.

Erase Cycles 10,000 – 100,000 100,000 – 1,000,000


Life Span Less than 10% of the life span of NAND Over 10 times more than NOR
Access Method Random Sequential
Ease-of-use Easy Complicated
(Hardware)
Ideal Usage Code storage – limited capacity due to Data storage only – due to complicated flash
price in high capacity. May save limited management. Code will usually not be stored
data as well. in raw NAND flash.
Examples: Examples:
Simple home appliances PC Cards
Embedded designs Compact Flash
Low-end set top boxes Secure Digital
Low-end mobile handsets MP3 players (music storage)
PC BIOS chips Digital Cameras (image storage)

Price High Low

PC Architecture (TXW102) September 2008 10


Topic 3 - Memory Architecture

High Speed NAND


In February 2008, Intel and Micron unveiled a high speed NAND flash memory technology. The
new technology—developed jointly by Intel and Micron and manufactured by the companies'
NAND flash joint venture, IM Flash Technologies (IMFT)—is five times faster than conventional
NAND, allowing data to be transferred in a fraction of the time for computing, video, photography,
and other computing applications.
The new high speed NAND can reach speeds up to 200 MB/s for reading data and 100 MB/s for
writing data, achieved by leveraging the new ONFI 2.0 specification and a four-plane architecture
with higher clock speeds. In comparison, conventional single level cell NAND is limited to 40
MB/s for reading data and less than 20 MB/s for writing data.
The applications and opportunities for the technology can be found on Micron's Web site at
www.micron.com/highspeednand.

SLC and MLC


For flash memory, particularly that used for Solid State Drives, the technology is either single-level
cell (SLC) or multi-level cell (MLC). MLC is a newer technology that offers faster read and write
speeds and better endurance than SLC.

PC Architecture (TXW102) September 2008 11


Topic 3 - Memory Architecture

Secure Digital Card


The Secure Digital (SD) Card is a universal flash memory storage device designed to meet the
converging security, capacity, ergonomic and performance requirements of emerging audio, video,
data and multimedia consumer electronics markets. The SD Memory Card was jointly developed
by Matsushita Electric (best known for its Panasonic brand name products), SanDisk and Toshiba,
market leaders of consumer electronics and flash memory data storage products. The SD Memory
Card is a modified, highly secured and significantly improved version of the industry-leading
MultiMediaCard, introduced in November 5, 1997 by SanDisk and Siemens. There are several SD
formats, including miniSD and microSD cards.

SD Memory Card

Secure Digital Card Slot


in Lenovo ThinkPad X60

• USB memory keys are flash memory that plug into a USB connector. They are an ideal
replacement for a diskette and are recognized as a removable drive under Windows.

Lenovo USB 2.0 Memory Key Memory Key in Lenovo ThinkPad notebook

PC Architecture (TXW102) September 2008 12


Topic 3 - Memory Architecture

Memory Terminology and Packaging:


Intel Turbo Memory
• Flash memory (NAND) in a Mini PCI Express slot in notebooks
• Primarily a disk-cache accelerator
• Accelerated application loading and running via intelligent file caching
• Benefits: power savings, faster resume, works with any SATA disk
• Example of 2 GB implementation:
- 1408 MB for ReadyBoost
- 640 MB for ReadyDrive
• 2 GB supports user application
pinning
• Used in select ThinkPad notebooks

Flash memory on a Mini


PCI Express adapter
(Intel Turbo Memory)

© 2008 Lenovo

Intel Turbo Memory


Introduced in May 2007, Intel Turbo Memory is Intel's implementation of flash memory on a Mini
PCI Express adapter in notebooks as a disk-cache accelerator. Intel uses NAND-based flash
memory which keeps the contents when powered off and provides extremely fast performance
compared to rotating platters of a disk.
Benefits of Intel Turbo Cache are that it works with any Serial ATA disk. It provides power
savings to notebooks (by reducing hard disk spin) and faster resume times from standby or
hibernate modes. It will work with Windows Vista ReadyDrive (write cache) and ReadyBoost
(read cache). Drivers only exist for Vista, so Intel Turbo Memory will not operate on any other
operating systems.
Intel Turbo Cache does require a Mini PCI Express slot which prohibits the slot's use for another
purpose (such as a WWAN adapter). It requires an Intel control ASIC chip and software driver for
operation.
Lenovo uses this technology in select ThinkPad notebooks.
The 1 GB Intel Turbo Memory introduced in 2007 allocates space as follows. These splits are
defined in firmware and are not customer configurable.
• 512 MB for ReadyBoost
• 384 MB for ReadyDrive
• 128 MB for firmware

PC Architecture (TXW102) September 2008 13


Topic 3 - Memory Architecture

The 2 GB Intel Turbo Memory introduced in 2008 allocates space as follows. These splits are
defined in firmware and are not customer configurable.
• 1408 MB for ReadyBoost
• 640 MB for ReadyDrive
The 2GB Intel Turbo Memory supports user application pinning if the hardware vendor supports
this. This features allows a user to load a application into the Turbo Memory to speed its
performance. If user pinning is used, it uses the space normally allocated to ReadyBoost. It is also
RAID compatible.

Intel Turbo Memory Configuration Screen

Intel Turbo Memory in either Half-Mini PCI Express Adapter or


Full-Mini PCI Express Adapter

PC Architecture (TXW102) September 2008 14


Topic 3 - Memory Architecture

Performance Improvements
The 1 GB Intel Turbo Memory provides significant performance benefits:
• Performance benefit of +45%
• Performance benefit of +40% with standard 7200 rpm HDD + 1 GB Turbo Memory
• Up to 23% faster boot up time

BitLocker Disabled BitLocker Enabled


HDD
w/ Reboot w/o Reboot w/ Reboot w/o Reboot
5400 rpm HDD
100% (ref.) Å 86% Å
(Baseline)
5400 rpm HDD
114% 145% 102% 127%
+ Intel Turbo Memory 1 GB
7200 rpm HDD 109% Å 101% Å
7200 rpm HDD
127% 164% 108% 140%
+ Intel Turbo Memory 1 GB

PC Architecture (TXW102) September 2008 15


Topic 3 - Memory Architecture

Memory Packaging:
DIMMs
• Main memory is implemented with DIMM packaging.
- Dual in-line memory module (DIMM)
• Independent signals used on each side
• 64-bit data path
• Lenovo desktops
- 240 pins (DDR2 or DDR3)
• Lenovo notebooks use smaller SODIMMs (Small Outline DIMMs)
- 200 pins (DDR2) or 204 pins (DDR3)

DIMM SODIMM
(for desktops) (for notebooks)

© 2008 Lenovo

Memory Packaging: DIMMs


Main memory is implemented with DIMM (dual in-line memory module) packaging. A DIMM
houses memory components on a printed circuit board. DIMMs began to replace SIMMs (single in-
line memory modules) as the main type of memory packaging when Intel's Pentium processors
began shipping.
A DIMM has metal contacts (called tabs, pins, or contacts) on the bottom that plug into a socket.
These contacts originally were gold-plated, but today the contacts are normally tin/lead. It is
important to match the type of metal in the socket to the metal connector in the DIMM. If gold is
connected to tin/lead, corrosion will occur over time and will destroy the connector (and the
systemboard).
The tabs on both sides of a DIMM module are independent, so they have different signals. A 240-
pin DIMM has 240 total tabs, 120 on each side. Because the tabs are independent, there are a total
of 240 independent signals, as opposed to the 72 independent signals on the 72-pin SIMM. The
240-pin DIMMs are a convenient way to support 64-bit processors and/or 64-bit memory transfers
into the processor. Sometimes they are called 8-byte DIMMs, because eight bytes equals 64 bits.
Most connectors in PCs are DIMMs. PCI adapters, for example, use DIMM-like connectors (i.e.,
individual connectors on either side of the board).

PC Architecture (TXW102) September 2008 16


Topic 3 - Memory Architecture

A DIMM has buffers for critical signals in the module that provide faster memory access and better
signal quality. Buffers on the module reduce the need for buffers on the systemboard, so the cost of
buffers is incremental as memory is added. Maximum memory efficiency is obtained by adding
memory buffers on the systemboard. Most DIMMs (except the 168-pin DIMMs) have memory
buffers. The eight-byte DIMMs can be parity, nonparity, or ECC and can support 3.3-volt
technology and 64 Mb and 256 Mb DRAMs.
The 72-pin older SIMM (single in-line memory module) and the DIMM are incompatible; they will
not fit in each other's socket or connector.
The DIMMs that are used on Lenovo notebooks are known as JEDEC SODIMMs (“SO” stands for
small outline).
While a DIMM typically has 240 pins, 64 pins are used for data, 8 may be used for ECC
information, about 36 are used for addressing, and the remainder are used for groundings and other
signal functions.

DIMM Sockets on Lenovo ThinkCentre systemboard

PC Architecture (TXW102) September 2008 17


Topic 3 - Memory Architecture

DIMM ranking
The number of ranks on any DIMM is the number of independent sets of DRAMs that can be
accessed simultaneously for the full data bit-width of the DIMM to be driven on the bus. The
physical layout of the DRAM chips on the DIMM itself does not necessarily relate to the number of
ranks. Sometimes the layout of all DRAM on one side of the DIMM compared to both sides is
referred to as "single-sided" versus "double-sided“, but these terms are misleading because the
terms do not necessarily relate to how the DIMMs are logically organized or accessed.
On a 64-bit (non-ECC) DIMM made with two ranks, there would be two sets of DRAM that could
be accessed at different times. Only one of the ranks can be accessed at a time, since the DRAM
data bits are tied together for two loads on the DIMM. Ranks are accessed through chip selects
(CS). Thus for a two rank module, the two DRAMs with data bits tied together may be accessed by
a CS per DRAM (e.g., CS0 goes to one DRAM chip and CS1 goes to the other). DIMMs are
currently being commonly manufactured with up to four ranks per module.
A rank of memory is the collection of DRAMs connected to a chip select signal from the memory
controller, abbreviated CS. Older controllers only provided two CS signals for every memory slot
and a maximum of two memory slots, thereby limiting capacity per memory channel to 4 ranks.
Consumer DIMM vendors have recently begun to distinguish between single- and dual-ranked
DIMMs. JEDEC decided that the terms "dual-sided," "double-sided," or "dual-banked" were not
correct when applied to registered DIMMs.
The term "rank" evolved from the need to distinguish the number of memory banks on a module as
opposed to the number of memory banks on a component. So, "rank" is used when referring to
modules, and "bank" is used when referring to components. The most commonly used modules
have either a single-rank of memory or a double-rank of memory.
Many factors influence the use of single-rank or double-rank modules. These include available
component densities and pricing, system memory requirements, the number of slots on a board, and
memory controller specifications. In the case of a 1 GB DIMM, it can be built as single-rank or
double-rank depending on the components used.

PC Architecture (TXW102) September 2008 18


Topic 3 - Memory Architecture

Module Device Number of


Density Rank Density Rank 1 Rank 2
Configuration Configuration Chips
64 MB x 72 64 MB x 72
1 GB 128 MB x 72 2 512 Mb 64 MB x 8 18
9 Device 9 Device
128 MB x 72
1 GB 128 MB x 72 1 512 Mb 128 MB x 4 18 -
18 Device
64 Mb x 72 64 MB x 72
1 GB 128 MB x 72 2 256 Mb stacked 64 MB x 4 36
18 Device 18 Device

Example of 1 GB DIMM rank configurations

Data 2 Rank Modules


18 DRAMs/rank x
Address
512 Mb/DRAM
Memory Controller

2 GB/rank

Chip Selects

2 ranks/slot
Address x 4 slots
16 GB capacity
Data

Data 4 Rank Modules


18 DRAMs/rank x
Address
512 Mb/DRAM
Memory Controller

2 GB/rank

Chip Selects

4 ranks/slot
Address x 4 slots
32 GB capacity
Data

Examples of DIMM rank configurations

PC Architecture (TXW102) September 2008 19


Topic 3 - Memory Architecture

Memory Terminology:
Dynamic Random Access Memory (DRAM)

• Used for main memory, graphics memory, disk controllers,


and disks
PCs have 1 GB to
8 GB main memory
• Can only access memory DRAM
when it is not being refreshed

• DRAM must be refreshed


frequently

• DRAM access times


are 50, 60, 70, and Normal memory access
80 nanoseconds Pre-charge Access Post-charge
Time

© 2008 Lenovo

DRAM
DRAM is memory that must be refreshed constantly. This requirement imposes delays between
accesses so that the charges can build up and stabilize.
Refreshing is continuously charging a device that cannot hold its content without an electrical
charge. Memory is refreshed with an electrical current.
An important performance measurement for memory is access time, which is the amount of time
that passes between the instant the processor issues an instruction to the memory to read data from
an address and the moment it receives it.

Rows and Columns


In order for information to be accessed – stored and read from memory – on a PC, the processor
must know where the information is located. The information must have an address. Memory
controllers break up memory addresses (often in a 32-bit binary format) into RAS (row access
strobe) and CAS (column address strobe) addresses. The RAS and CAS addresses are used to
obtain the memory location, which is set up like an array or grid (table). When a burst (single
address followed by multiple data transfers) memory transaction occurs, its memory controller
actually does RAS-CAS, CAS, CAS, CAS addressing.
The amount of rows and columns of a memory chip is complex. The following tables give some
information on the various chips. x1, x4, etc., refer to the number of data bits or DQs per DRAM.
The most popular is x4. (DRAM has four data inputs/outputs.)

PC Architecture (TXW102) September 2008 20


Topic 3 - Memory Architecture

The address is provided in two successive addresses – RAS and CAS – reducing pin count and
complexity of the memory packaging in exchange for a trivial increase in controller cost.
The CAS is a control pin on a DRAM used to latch and activate a column address. The column
selected on a DRAM is determined by the data present at the address pins when CAS becomes
active.
The RAS is a control pin on a DRAM used to latch and activate a row address. The row selected on
a DRAM is determined by the data present at the address pins when RAS becomes active.
The following memory chips are addressed as follows:

x1 x4 x8
Memory Chip
Rows/Columns Rows/Columns Rows/Columns
1 Mb 10/10 9/9 NA
4 Mb 11/11 10/10 10/9
16 Mb 12/12 12/10 or 11/11 12/9 or 11/10
64 Mb 13/13 13/11 or 12/12 13/10 or 12/11
256 Mb NA 14/12 or 13/13 14/11 or 13/12

x16 x32
Memory Chip
Rows/Columns Rows/Columns
1 Mb NA NA
4 Mb 9/9 NA
16 Mb 12/8 or 10/10 10/9
64 Mb 13/9 or 12/10 11/10
256 Mb 14/10 or 13/11 or 12/12 14/9

DRAM Chips
A DRAM chip contains a matrix of memory cells, each cell holding a single bit. A cell is made up
of a capacitor (a tiny device that can hold an electrical charge) and a transistor, which acts like a
switch. To read a cell, the transistor is activated, and triggers the capacitor. A discharge of current
signifies a 1, and the capacitor must be recharged, and no discharge represents a 0.
To select a bit to read, a computer uses a grid of wires connected to the memory cell matrix. First, a
small voltage is applied to the selected row, thus selecting all bits in that row. Next, the appropriate
column wire is triggered. The combination is enough to activate the transistor at that row and
column. This process takes time, however, and much of the latency involved in memory accesses
comes from the time it takes to select the desired bits.

PC Architecture (TXW102) September 2008 21


Topic 3 - Memory Architecture

Buffered and Unbuffered Memory


There are two types of memory currently on the market: buffered (or registered) and unbuffered (or
unregistered). They cannot be mixed together in a system.
Memory that operates extremely fast sometimes must be buffered or registered (on the memory
module or controller). Buffering reduces the dependence of the system board timing on the module
loading. Buffered modules require additional components, printed circuit board area, routing, and
system power, and adds one or two clock cycles to every memory access, depending on the extent
of the buffering.
With unbuffered DIMMs (or UDIMMS), the memory controller communicates directly with the
DRAMs. They are therefore typically one access clock faster than buffered DIMMs. The
disadvantage of unbuffered DIMMs is that they have a greater electrical loading factor; thus, fewer
DIMMs are typically supported in a system. While this limitation is acceptable for desktop systems,
servers usually require large amounts of memory, and most servers are being standardized on
buffered DIMMs.
Buffered DIMMs, on the other hand, use registers to isolate the memory controller from the
DRAMs, leading to a lighter electrical load and therefore the support of more DIMMs and
availability of large memory capacities. Buffered DIMMs are typically one clock tick slower for the
register operation. Because the DRAMs are isolated from the controller, memory manufacturers
can implement DIMMs with various DRAM chips.

Serial Presence Detect (SPD)


Most memory has Serial Presence Detect (SPD) which includes the parameters for the memory
stored in EEPROM. Parameters include speed and access time, number and organization of chips,
and special features such as fast random access. The EEPROM is programmed by the module
vendor. Systems use SPD to configure the memory at boot time. Without SPD, systems must use
the most conservative timings.

PC Architecture (TXW102) September 2008 22


Topic 3 - Memory Architecture

Memory Timing
Memory is organized on a chip in rows and columns, and it is accessed by repeated pulses of
electricity (called strobing) to reach each location. When memory is accessed, each strobing
cycle takes a fixed amount of time consisting of these four elements:
• tCL – Column address strobe (CAS) latency; the number of clock cycles required to access
a specific column of data. (The initial t refers to time.)
• tRCD – Row address strobe (RAS)-to-CAS delay; the number of clock cycles needed
between a row address strobe and a column address strobe.
• tRP – RAS precharge; the number of clock cycles needed to close one row of memory and
open another.
• tRAS – The number of clock cycles needed to access a specific row of data in RAM.
If a DRAM label says DDR2-800 5-5-5-15, here is the breakdown:
• 800 is the effective clock speed in megahertz. It is the actual clock speed times data per
clock cycle (200 MHz [for DDR2-800] x 4 [4 samples per clock cycle for DDR2]).
• "5-5-5-15" refers to a tCL of 5, tRCD of 5, tRP of 5, and tRAS of 15.
Latency is the delay that occurs during memory access. Since latency is measured in clock
cycles, a small number is better because less time is required for memory accesses. The time is
measured in nanoseconds (ns) with a typical system doing millions of memory accesses each
second. Memory speed and latency move in opposite directions (as memory speed gets faster,
then latency gets slower). For example, the same DDR2-667 memory module can run at 333
MHz with a 5-5-5-13 latency or at DDR2-533 speed at 266 MHz with a 4-4-4-11 latency. Since
higher clock frequencies represent smaller time intervals, the total time is practically the same
for both these settings.
Some memory vendors offer premium memory which runs at high clock speeds and lower
latencies. Specialized applications are sensitive to memory performance, so premium memory
results in better performance for games, media transcoding, and 3D rendering because they are
all sensitive to memory latencies. Mainstream applications like Web browsing, office
applications, and streaming media are less sensitive.

PC Architecture (TXW102) September 2008 23


Topic 3 - Memory Architecture

Types of DRAM:
Used for Main Memory

• DDR2
- Second generation of DDR
- Higher bandwidth than DDR1
- Introduced in 2004 for desktops and in
2005 for notebooks
• DDR3
- Third generation of DDR
- Higher bandwidth than DDR2
- Lower power
- Introduced in 2007 for desktops and in
2008 for notebooks

© 2008 Lenovo

Types of DRAM
Main memory in computers was implemented with different types of memory as follows:
• Prior to 1995: conventional DRAM
• 1996: Fast Page Mode (FPM)
• 1997: Extended Data Out (EDO)
• 1998-1999: Synchronous DRAM (SDRAM)
• 2000-2001: Rambus RDRAM and SDRAM
• 2002: SDRAM, DDR1 (or DDR-SDRAM), and Rambus RDRAM
• 2003: DDR1
• 2004-2005: DDR1 and DDR2
• 2006-2008: DDR2
• 2007-present: DDR3
These memory types can also be implemented on any device or subsystem that has memory, such
as a graphics controller, disk cache, buffer, adapter memory, etc.

PC Architecture (TXW102) September 2008 24


Topic 3 - Memory Architecture

Synchronous DRAM (SDRAM) – 1998 to 2002


Synchronous DRAM (SDRAM) was common in computers around 1998 to 2002. SDRAM has all
control, address, and data signals synchronized to a single clock – the same clock as the system, or
local, bus. DRAM itself (not the memory controller) provides the address of the next memory
location, and includes an on-chip burst counter to increment the column address. SDRAM provided
enhanced performance over the earlier FPM and EDO memory. With FPM and EDO memory,
signals are routed through a controller chip, and the DRAM often must wait for the controller to
catch up. However, since SDRAM is tied to the speed of the system bus, this synchronicity
eliminates wait states, makes the implementation of control interfaces easier, and makes column
(but not row) access time quicker.
The clock for SDRAM is coordinated with the processor clock so that the memory and
microprocessor are synchronized, allowing the processor to perform other operations without
waiting for the memory to locate the address and read or write the data. Effectively,
synchronization reduces the time it takes to execute commands and transmit data. So SDRAM is
synchronized to the frontside bus speed of the processor.
Although SDRAM access time is usually 60 ns, a 100 MHz bus will support a 10 ns page cycle
time once the pipeline has been filled. A 66 MHz bus will support a 15 ns page cycle time. A 133
MHz bus will support a 7.5 ns page cycle time. SDRAM is typically 3.3 volts. SDRAM is available
as 60, 66, 83, 100, 125, and 133 MHz.
The two specification names for SDRAM are PC100 and PC133
SDRAM speeds are measured by their fastest clock cycle time. For example, -12 for 12ns (83 Hz),
-10 for 10 ns (100 MHz), -8 for 8ns (125 MHz), and -7 R5 for 7.5 ns (133 MHz); it is pronounced
dash 12, etc. CAS latency forces systems to decide whether to use faster memory or not. The -10
parts can run at 100 MHz CAS latency of 3 or at 66 MHz CAS latency of 2.

PC Architecture (TXW102) September 2008 25


Topic 3 - Memory Architecture

DDR1 (DDR-SDRAM) – 2003 to 2005


• First-generation DDR-SDRAM
• Similar architecture to SDRAM
• Double data rate (DDR)
– Data transferred on both rising and falling edges of clock
– Doubles throughput at same memory speed

1 cycle 1 cycle 1 cycle

Odd Even

Clock

Data 0 1 2 3 4 5

Data packet

Marketing Naming Speed Single-Channel Dual-Channel


Data Rate
Name Convention Grades Bandwidth Bandwidth
200 MT/s
DDR200 PC1600 PC200 1.6 GB/s 3.2 GB/s
(100 MHz x 2)
266 MT/s
DDR266 PC2100 PC266 2.1 GB/s 4.2 GB/s
(133 MHz x 2)
333 MT/s
DDR333 PC2700 PC333 2.7 GB/s 5.4 GB/s
(166 MHz x 2)
400 MT/s
DDR400 PC3200 PC400 3.2 GB/s 6.4 GB/s
(200 MHz x 2)

Double data rate (DDR) synchronous DRAM (SDRAM) (also called double-speed DRAM or
DDR-1) provides additional throughput and features over standard SDRAM. DDR1 was introduced
in systems after chipset support in various memory controllers became available in late 2001.
DDR1 allows data to latch on both the rising and falling edge of the clock. Thus, for the same
memory speed as SDRAM, and with no increase in clock frequency, throughput can be doubled.
With the original SDRAM, a 100 MHz SDRAM chip handles a single memory operation per clock
cycle; the original SDRAM’s data rate, then, is effectively 100 MHz x 1, or 100 MHz. A PC133
SDRAM chip has a data rate of 133 MHz. PC100 and PC133 are in effect single data rate (SDR)
SDRAM. DDR memory chips can perform two operations during a single clock cycle. A 100 MHz
DDR memory chip’s data rate is thus 100 MHz x 2 or 200 MHz, a 133 MHz DDR memory chip
has a data rate of 133 MHz x 2 or 266 MHz, and so forth.
The DDR1 memory bus runs at memory-bus clock rate of 100 MHz for PC1600, 133 MHz for
PC2100, 166 MHz for PC2700, and 200 MHz for PC3200. However each DDR1 memory module
and memory chip runs at an effective (data) rate of 200 MHz, 266 MHz, 333 MHz, or 400 MHz.
The computer industry has adopted a practical convention of referring to the data rate as the DDR1
DIMM speed. So, PC1600 DIMMs are said to run at 200 MHz, PC2100 at 266 MHz, PC2700
DIMMs at 333 MHz, and PC3200 DIMMs at 400 MHz.

PC Architecture (TXW102) September 2008 26


Topic 3 - Memory Architecture

DDR-SDRAM (DDR1) is architecturally similar to SDRAM but has two main differences:
• DDR1 has more advanced synchronization circuitry than SDRAM.
• DDR1 uses a delay-locked loop (DLL) to provide a DataStrobe signal as data becomes valid on
the SDRAM pins. The controller uses the DataStrobe signal (one for every 16 outputs) to locate
data more accurately and resynchronize incoming data from different DIMMs.
The specifications for DDR1 memory modules are developed and approved by JEDEC. JEDEC is
the semiconductor standardization body of the Electronic Industries Alliance (EIA). About 350
member companies representing every segment of the industry actively participate to develop
standards to meet the industry needs.

DDR1 Specifications for JEDEC 200/266/333/400 MHz


• 184-pin DIMM, ECC or non-ECC
• 200-pin SODIMM, ECC or non-ECC
• 172-pin Micro-DIMM, non-ECC
• 2.5 volts
• SSTL-2 I/O interface
• CAS latencies: 2 or 2.5 for PC1600/2100/2700; 2.5 or 3.0 for PC3200
• Serial presence detect (SPD) support
• Support for memory chip stacking

DDR1 Naming Convention


• A memory chip is referred to by its native speed:
– For example, 200 MHz DDR1 SDRAM memory chips are called DDR200 chips, 266 MHz
DDR1 SDRAM chips are called DDR266 chips, 333 MHz DDR SDRAM memory chips are
called DDR333 chips, and 400 MHz DDR1 SDRAM memory chips are called DDR400 chips.
• A DDR1 module is named after its peak bandwidth, which is the maximum amount of data that
can be delivered per second:
– A 200 MHz DDR1 DIMM is called a PC1600 DIMM, with 1.6 GB/s bandwidth (8 bytes [64-
bit data path] x 100 MHz x 2 [DDR] = 1600 MB/s or 1.6 GB/s).
– A 266 MHz DDR1 DIMM is called a PC2100 DIMM with 2.1 GB/s bandwidth (8 bytes [64-
bit data path] x 133 MHz x 2 [DDR] = 2100 MB/s or 2.1 GB/s).
– A 333 MHz DDR1 DIMM is called a PC2700 DIMM with 2.7 GB/s bandwidth (8 bytes [64-
bit data path] x 166 MHz x 2 [DDR] = 2700 MB/s or 2.7 GB/s).
– A 400 MHz DDR1 DIMM is called a PC3200 DIMM with 3.2 GB/s bandwidth (8 bytes [64-
bit data path] x 200 MHz x 2 [DDR] = 3200 MB/s or 3.2 GB/s).

PC Architecture (TXW102) September 2008 27


Topic 3 - Memory Architecture

SDRAM and DDR1 Memory Standards

Clock Cycles Bus Bus Transfer


Module
Module Format Chip Type Speed per Speed Width Rate
Standard
(MHz) Clock (MT/s) (Bytes) (MB/s)
PC66 SDR DIMM 10ns 67 1 66 8 533
PC100 SDR DIMM 8ns 100 1 100 8 800
PC133 SDR DIMM 7.5ns 133 1 133 8 1,066
PC1600 DDR1 DIMM DDR200 100 2 200 8 1,600
PC2100 DDR1 DIMM DDR266 133 2 266 8 2,133
PC2400 DDR1 DIMM DDR300 150 2 300 8 2,400
PC2700 DDR1 DIMM DDR333 167 2 333 8 2,666
PC3000 DDR1 DIMM DDR366 183 2 366 8 2,933
PC3200 DDR1 DIMM DDR400 200 2 400 8 3,200
PC3600 DDR1 DIMM DDR444 222 2 444 8 3,555
PC4000 DDR1 DIMM DDR500 250 2 500 8 4,000
PC4300 DDR1 DIMM DDR533 267 2 533 8 4,266

PC Architecture (TXW102) September 2008 28


Topic 3 - Memory Architecture

Types of DRAM:
DDR2

• Successor to DDR1 (DDR-SDRAM) with higher speed and


lower power
• Chipset support required (Intel 915 or later for desktop and
mobile)
• Introduced in ThinkCentre desktops in summer 2004;
ThinkPad notebooks in January 2005
• Better compatibility than DDR1 systems (DDR2 systems
work with any DDR2 memory speed)

DDR1 (2002 to 2004) DDR2 (2004 and after)


100, 133, 166, 200 MHz 400, 533, 667, 800 MHz
184-pin DIMM or 240-pin DIMM or
200-pin SODIMM 200-pin SODIMM
2.5 volts 1.8 volts

© 2008 Lenovo

DDR2
The second generation of double data rate memory for desktop and notebook systems is Double
Data Rate 2 (DDR2). DDR2 is a type of synchronous DRAM that provides higher performance and
lower power consumption compared to DDR-SDRAM or DDR1. The specification for DDR2 was
formally released by JEDEC in late 2003 at www.jedec.org.
DDR2 supports four speeds of 400, 533, 667, and 800 MHz (each of these speeds is twice the
external clock frequency). Data is latched on both the rising and falling edge of the clock so
throughput is effectively doubled at the base clock rate.
DDR2 requires a memory controller in a chipset for support. Various chipsets support DDR2 such
as the Intel Grantsdale 915 (for desktops) and Intel Alviso 915 (for notebooks). A system cannot
support a mixed DDR1 and the newer DDR2 simultaneously; so a system will support all DDR1 or
all DDR2.
Systems with DDR2 memory controllers usually support any speed of DDR2 memory. If you mix
different speed DDR2 DIMMs in a system, the system will operate at the slowest DIMM speed; if
the memory controller does not support faster DIMM speeds, the system will operate at the slower
memory controller speed. Whereas systems with DDR1 memory controllers did not support any
speed of DDR1 memory, systems with DDR2 memory controllers offer more flexibility and
compatibility.

PC Architecture (TXW102) September 2008 29


Topic 3 - Memory Architecture

Types of DRAM:
DDR2 Benefits

• Operates at a lower voltage (1.8 volts) than DDR1 (2.5 volts) to


reduce power consumption
• Uses a different packaging which provides better electrical
performance
• Provides on-die termination to improve signal quality
• Reduces memory page sizes to lower the energy required for
activating pages
• Increases the prefetch (from 2 to 4 bits) to improve performance
Peak System Bandwidth
1 cycle 1 cycle 1 cycle
Odd Even 8.5 DDR2

Bandwidth GB/s
Clock 6.4 DDR400

5.3 DDR333
Data 0 1 2 3 4 5
4.2 DDR266
Data packet
2002 2003 2004 2005
© 2008 Lenovo

DDR2 Benefits
DDR2 operates at 1.8 volts compared to 2.5 volts for DDR1. This lower power consumption at
comparable operation frequencies gives headroom for operation at higher frequencies, and extends
battery life in mobile applications.
DDR2 used as main memory for desktops is implemented with 240-pin DIMMs; a desktop with
DDR2 support will not support the older DDR1 184-pin DIMM. DDR2 used as main memory for
notebooks will use a 200-pin SODIMM; while older notebook DDR1 also uses a 200-pin
SODIMM, a DDR2-based notebook only supports DDR2 SODIMMs.
DDR2 is available in Ball Grid Array (BGA) packaging including Fine-pitch Ball Grid Array
(FBGA). BGA packaging allows DDR2 to support higher operation frequencies compared to the
Thin Small Outline Package (TSOP) packaging used with DDR1.

PC Architecture (TXW102) September 2008 30


Topic 3 - Memory Architecture

Posted CAS and Additive Latency (AL)


DDR2 supports posted CAS and additive latency (AL). In a posted column address strobe (CAS), a
CAS signal (read/write command) can be input to the next clock after a row address strobe (RAS)
signal (active command) input. The CAS command is held by the DRAM side and executed after
the additive latency (0, 1, 2, 3, and 4). This provides easier controller design by avoiding collisions
on the command bus, improves command and data bus efficiency because of the simple command
order, and improves memory bandwidth.
DDR2 latencies are a minimum of CAS4-5. The lower memory core frequency means longer
latency time (the time it takes to set up the request for data transfer). If the data bus speed is 400
MHz, the external clock frequency is 200 MHz; if the DRAM has a CAS3, then because the
DRAM core frequency is half that of the external clock frequency, this configuration translates to a
CAS6 relative to the data bus speed.

4-bit Prefetch
DDR2 increases the prefetch (from 2 bits with DDR1 to 4 bits with DDR2) to improve
performance. DDR2 SDRAM achieves high-speed operation by 4-bit prefetch architecture. This 4-
bit prefetch delivers twice the external bandwidth of DDR1 for the same internal core frequency. In
4-bit prefetch architecture, DDR2 can read/write four times the amount of data on the external bus
from/to the memory cell array for every clock; it can deliver four times faster than the internal core
frequency.
• External clock frequency = two times faster than internal core operating frequency
• Data bus speed throughput = two times faster than external clock frequency
A comparison between SDR SDRAM, DDR1 SDRAM, and DDR2 SDRAM with a DRAM core
operating frequency of 100 MHz is shown below.

Item SDR SDRAM DDR1 DDR2

Prefetch 1-bit 2-bit 4-bit

Internal bus core frequency 100 MHz 100 MHz 100 MHz

External clock frequency 100 MHz 100 MHz 200 MHz

Data bus speed throughput 100 Mb/s 200 Mb/s 400 Mb/s

PC Architecture (TXW102) September 2008 31


Topic 3 - Memory Architecture

DRAM External
core frequency clock frequency Data bus speed
100 MHz 100 MHz 100 Mb/s

Memory
SDR I/O
Cell
SDRAM Buffer
Array

DRAM External
core frequency clock frequency Data bus speed
100 MHz 100 MHz 200 Mb/s

Memory
DDR I/O
Cell
SDRAM Buffer
Array
(DDR1)

2-bit prefetch

DRAM External
core frequency clock frequency Data bus speed
100 MHz 200 MHz 400 Mb/s

Memory
DDR2 I/O
Cell
SDRAM Buffer
Array

4-bit prefetch

Comparisons of SDRAM memory


(all using same 100 MHz core frequency)

PC Architecture (TXW102) September 2008 32


Topic 3 - Memory Architecture

DDR1 vs. DDR2 Core Frequency Speed Differences


With DDR1 and a 400 MHz data bus speed, the DRAM core frequency and external clock
frequency is 200 MHz.
With DDR2, a 400 MHz data bus speed means a 100 MHz DRAM core frequency and a 200 MHz
external clock frequency. So with DDR2, the DRAM core frequency is half that of the external
clock frequency. The memory core of DDR2 operates at only half the frequency of DDR1. This is
why statements such as “DDR2 devices operate at twice the speed of DDR1 devices” and “DDR2
delivers twice the external bandwidth of DDR1 for the same DRAM core frequency” are made.
So while a DDR1 at 400 MHz and DDR2 at 400 MHz both yield a 3.2 GB/s bandwidth, the DDR2
DRAM core frequency is running at 100 MHz while the DDR1 DRAM core frequency is running
at 200 MHz.
DDR2 operates at half the DRAM core frequency of DDR1 (yet obtains the same bandwidth)
because DDR2 uses a 4-bit data prefetch so four sets of data are read from and written to the
memory core; DDR1 uses 2-bit data prefetch where two sets of data are read from and written to
the memory core.
With DDR2 having double the size of DDR1’s data prefetch, DDR2 can transfer twice as much
data as DDR1 per clock cycle. However, because DDR2 runs at a slower internal clock speed than
DDR2, the result is the same throughput for both.

200 MHz x 4 = 800 MHz system bus 200 MHz x 4 = 800 MHz system bus

400 MHz 400 MHz

I/O buffers I/O buffers


200 MHz 200 MHz

Double the number


of prefetches

Half the
Memory core Memory core speed of the
200 MHz 100 MHz memory core

DDR1 at 400 MHz DDR2 at 400 MHz

PC Architecture (TXW102) September 2008 33


Topic 3 - Memory Architecture

ODT (On Die Termination)


In DDR2 SDRAM, the mount termination resistor conventionally mounted on the systemboard is
incorporated inside the DRAM chip. The DRAM controller can set the termination resistor for each
signal (data I/O, differential data strobe, and write data mask) on and off.
• Improves signal integrity by controlling reflected noise on the transfer line
• Reduces parts costs by reducing the parts counts on the systemboards
• Easier system design by eliminating the complicated placement and routing for the termination
register
Terminator Terminator
OFF ON
DRAM DRAM DRAM DRAM
at at at at
Active Standby Active Standby
Terminator at
VTT
systemboard
VTT

Controller DQ bus Controller DQ bus


Reflection Reflection

Systemboard Termination On Die Termination


of DDR1 of DDR2

OCD (Off-Chip Driver) Calibration


DDR2 SDRAM improves signal integrity by Off-Chip Driver (OCD) calibration. In OCD
calibration, the I/O driver resistance is set to adjust the voltage to equalize the pull-up/pull-down
resistance.
• Improves signal integrity by minimizing the DQ-DQS skew.
• Improves signal quality by controlling the overshoot and undershoot.
• Absorbs process variations from each DRAM supplier by I/O driver voltage calibration.

PC Architecture (TXW102) September 2008 34


Topic 3 - Memory Architecture

Feature DDR1 DDR2 DDR2 Advantage


TSOP Enables better electrical
Package FBGA only
(66 pins) performance and speed
Reduces memory system power
2.5V 1.8V demand; voltage regulator on
Voltage
2.5V I/O 1.8V I/O systemboard is different for DDR1
and DDR2
256 Mb to 4 Gb
High-density components enable
Densities 128 Mb to 1 Gb (supports x4, x8, x16 large memory subsystems
configurations)
1 Gb and higher DDR2 devices will
Internal banks 4 4 and 8 have eight banks for better
performance
Provides reduced core speed
Prefetch (min write burst) 2-bit 4-bit
dependency for better yields

200 MHz, 400 MHz,


266 MHz, 533 MHz,
Speed (data pin) Migration to higher-speed I/O
333 MHz, 667 MHz,
400 MHz 800 MHz

Read latency 2, 2.5, 3 CLK CL + AL Eliminating a ½ clock cycle helps


CL = (3, 4, 5) speed internal DRAM logic and
improves yields

Additive Latency (AL) N/A AL options Mainly used in server applications


(Posted CAS) (0, 1, 2, 3, 4) to improve command bus efficiency

Write latency 1 clock Read latency - 1 Improves command bus efficiency

DRAM on-die ODT for both memory and controller


termination (ODT), improves signaling and reduces
Systemboard
Termination optional on- system cost; the systemboard for
parallel to VTT
systemboard DDR1 requires additional
termination termination resistors
Differential or Improves system timing margin by
Data strobes Single-ended
single-ended reducing strobe crosstalk

240-pin Modules are the same length, with


184-pin
unbuffered or added pins; DDR2 SODIMMs have
unbuffered or
registered, same pinouts as DDR1 devices;
Modules registered,
200-pin SODIMM, size and shape of power distribution
200-pin SODIMM,
214-pin MicroDIMM, to memory is different for DDR1 and
172-pin MicroDIMM
244-pin MiniDIMM DDR2

Enables system to align pull-up/pull-


Off-Chip Driver (OCD) Memory controller down drive strengths to nominal
No
Calibration configured conditions; feature is not expected
to be widely used

Comparisons of DDR1 and DDR2 memory

PC Architecture (TXW102) September 2008 35


Topic 3 - Memory Architecture

Types of DRAM:
DDR2 Speeds

Marketing Naming Single Channel Dual Channel


Data Rate
Name Convention Bandwidth Bandwidth
400 MT/s
DDR2-400 PC2-3200 3.2 GB/s 6.4 GB/s
(200 MHz x 2)
533 MT/s
DDR2-533 PC2-4200 4.2 GB/s 8.4 GB/s
(266 MHz x 2)
667 MT/s
DDR2-667 PC2-5300 5.3 GB/s 10.6 GB/s
(333 MHz x 2)
800 MT/s
DDR2-800 PC2-6400 6.4 GB/s 12.3 GB/s
(400 MHz x 2)

Processor

400/533/800 MHz DDR2 DIMM


DDR2
(G)MCH 3.2 to 6.4 GB/s
DDR2
400/533/667/800 MHz

© 2008 Lenovo

DDR2 Speeds
A DDR2 module is named after its peak bandwidth, which is the maximum amount of data that can
be delivered per second:
• A 400 MHz DDR2 DIMM is called a PC2-3200 DIMM, with 3.2 GB/s bandwidth
(8 bytes [64-bit data path] x 200 MHz x 2 [DDR] = 3200 MB/s or 3.2 GB/s).
• A 533 MHz DDR2 DIMM is called a PC2-4200 DIMM with 4.2 GB/s bandwidth
(8 bytes [64-bit data path] x 266 MHz x 2 [DDR] = 4200 MB/s or 4.2 GB/s).
• A 667 MHz DDR2 DIMM is called a PC2-5300 DIMM with 5.3 GB/s bandwidth
(8 bytes [64-bit data path] x 333 MHz x 2 [DDR] = 5300 MB/s or 5.3 GB/s).
• An 800 MHz DDR2 DIMM is called a PC2-6400 DIMM with 6.4 GB/s bandwidth
(8 bytes [64-bit data path] x 400 x 2 [DDR] = 6400 MB/s or 6.4 GB/s).

DDR2 Memory Sockets on Lenovo ThinkCentre systemboard

PC Architecture (TXW102) September 2008 36


Topic 3 - Memory Architecture

Types of DRAM:
DDR3

• Successor to DDR2 with higher


bandwidth and lower power
• Chipset support required
- Desktop: Intel 3 and 4 Series
DDR3 SODIMM (for notebooks)
- Notebook: Intel GM45, GM47, PM45
• Introduced in latest ThinkPad notebooks
in 2008
DDR1 (2002 to 2005) DDR2 (2004 to 2008) DDR3 (2007 and after)
100, 133, 167, 200 MHz 200, 267, 333, 400 MHz 400, 533, 667, 800 MHz
184-pin DIMM or 240-pin DIMM or 240-pin DIMM (desktop)
200-pin SODIMM 200-pin SODIMM 204-pin SODIMM (notebook)
Unique slot and Unique slot and Unique slot and
modules modules modules
2.5 volts 1.8 volts 1.5 volts

© 2008 Lenovo

DDR3
The third generation of double data rate memory for desktop and notebook systems is Double Data
Rate 3 (DDR3). DDR3 is a type of synchronous DRAM that provides higher performance and
lower power consumption compared to DDR1 or DDR2. The specification for DDR3 was formally
released by the JEDEC Solid State Technology Association in 2007 at www.jedec.org.
DDR3 supports clock frequencies of 400, 533, 667, and 800. Data is latched on both the rising and
falling edge of the clock so throughput is effectively doubled at the base clock rate.
DDR3 requires a memory controller in a chipset for support. Various chipsets support DDR3 such
as some Intel desktop chipsets (Intel 3 and 4 Series chipsets). Notebook chipset support is included
in the GM45, GM47, and PM45 with code-name Cantiga. A system cannot support a mixed DDR2
and DDR3 simultaneously; so a system will support all DDR2 or all DDR3. The memory DIMMs
have different keys to prevent inserting in the incorrect slot.
Systems with DDR3 memory controllers usually support any speed of DDR3 memory. If you mix
different speed DDR3 DIMMs in a system, the system will operate at the slowest DIMM speed; if
the memory controller does not support faster DIMM speeds, the system will operate at the slower
memory controller speed. Similar to DDR2 systems, systems with DDR3 memory controllers offer
excellent flexibility and compatibility.

PC Architecture (TXW102) September 2008 37


Topic 3 - Memory Architecture

DDR Memory Transition


Source: Micron Marketing

PC Architecture (TXW102) September 2008 38


Topic 3 - Memory Architecture

Types of DRAM:
DDR3 Benefits

• Increased bandwidth
- Higher bandwidth due to
increased clock frequency
- Doubles the prefetch
(from 4 to 8 bits)
• Lower power consumption
- Operates at a lower - DDR3 DIMM (top)
voltage (1.5 volts) than - DDR2 DIMM (bottom)
DDR2 (1.8 volts) - Different key notch prevents
installing in wrong memory slot
• Application performance similar
to DDR2
- Higher DDR3 frequencies offset
by slower latencies

© 2008 Lenovo

DDR3 Benefits
DDR3 operates at 1.5 volts compared to 1.8 volts for DDR2. This lower power consumption at
comparable operation frequencies gives headroom for operation at higher frequencies, and extends
battery life in mobile applications.
DDR3 used as main memory for desktops is implemented with 240-pin DIMMs. A notch 48 pins
from the left of one side separates the DIMM so that DDR3 is not accidentally placed in DDR1 or
DDR2 memory slots.

Double Date Rate (DDR) memory transfers


on the rising and falling edge

PC Architecture (TXW102) September 2008 39


Topic 3 - Memory Architecture

The Ball Grid Array (BGA) chip packaging in DDR3 has more contact pins than DDR2. This
simplifies the chip mounting procedure and increases mechanical robustness of the ready solutions
as well as improves signal quality at high frequencies.
DDR3 SDRAM signal protocol is improved over DDR2 as the memory bus frequency increased
significantly. DDR3 uses fly-by topology with on-module signal termination to transfer addresses,
management, and stabilization commands. This means that the signals are sent to all chips of the
memory modules one by one (not altogether at the same time).
The data reading/writing algorithms from DDR2 have also changed. The DDR3 controller has to
successfully recognize and process time shifts on data receipt from the chips generated by fly-by
architecture used for commands transfer. This technique is known as read/write leveling.

Feature DDR1 DDR2 DDR3


Data Rate 200 to 400 MT/s 400 to 800 MT/s 800 to 1600 MT/s
Effective clock speeds 200, 266, 333, 400 MHz 400, 533, 667, 800 MHz 800, 1066, 1333, 1600 MHz
System Assumption 4 slots (8 loads) 2 slots (4 loads) 2 slots (4 loads)
Voltage (Vdd/Vddq) 2.5 V ± 0.2 V 1.8 V ± 0.1 V 1.5 V ± 0.075 V
Interface SSTL_2 SSTL_18 SSTL_15
66 TSOP2 60 BGA for x4/x8 78 BGA for x4/x8
Package
60 BGA 84 BGA for x16 96 BGA for x16
Bi-directional DQS Bi-directional DQS Bi-directional DQS
Source sync.
(Single ended Default) (Single./Diff. Option) (Differential Default)
Burst Length BL = 2, 4, 8 BL = 4, 8 BL = 4, 8
(2 bits Prefetch) (4 bits Prefetch) (8 bits Prefetch)

512 Mb: 4 banks 512 Mb/1 Gb: 8 banks


Banks 4 banks
1 Gb: 8 banks 2 Gb/4 Gb/8 Gb: tbd
tCL/tRCD/tRP/tRAS ~4-4-4-12 ns ~5-5-5-15 ns ~7-7-7-15 ns
Reset No No Yes
ODT No Yes Yes

Driver Calibration No Off-chip Driver Calibration Self-calibration with ZQ Pin

Leveling No No Yes

Comparisons of DDR1, DDR2, and DDR3 memory

PC Architecture (TXW102) September 2008 40


Topic 3 - Memory Architecture

5.25"

240-pin DDR2 DIMM Module 1.18"

2.48" 2.17"

5.25"

240-pin DDR3 DIMM Module 1.18"

1.85" 2.80"

DDR2 vs DDR3 desktop DIMMs


(same dimensions but different keys)

PC Architecture (TXW102) September 2008 41


Topic 3 - Memory Architecture

Types of DRAM:
DDR3 Bandwidth

• DDR3 uses 8-bit prefetch compared to


DDR2 4-bit prefetch
• Provides higher bandwidth, but increases memory
latency so application performance similar to DDR2

DDR2 SDRAM DDR3 SDRAM


DRAM DRAM Interface DRAM DRAM Interface

I/O buffers
Memory I/O buffers Memory
core 4x rate core 8x rate

4n bits 8n bits

Clock speed 400, 533, 667, 800 MHz Clock speed 800, 1066, 1333, 1600 MHz
3.2, 4.2, 5.3, 6.4 GB/s 6.4, 8.5, 10.7, 12.8 GB/s
5-5-5-15 ns latency 7-7-7-15 ns latency

© 2008 Lenovo

DDR3 Bandwidth
While DDR2 uses a 4-bit prefetch, DDR3 uses an 8-bit prefetch also known as 8n-prefetch. This
means DDR3 doubles the internal bus width between the actual DRAM core and the input/output
buffer. The increase in the data transfer rate provided by DDR3 does not require faster operation of
the memory core. Only external buffers work faster. The core frequency of the memory chips are 8
times lower than that of the external memory bus and DDR3 buffers.
Therefore DDR3 can almost immediately hit higher frequencies than DDR2 without big
improvements of the semiconductor manufacturing process. However, the disadvantage is it
increases not only memory bandwidth, but also memory latency. So DDR3 will not provide
significantly better application performance than DDR2 even if it operates at higher frequencies
than DDR2.

PC Architecture (TXW102) September 2008 42


Topic 3 - Memory Architecture

Comparisons of SDRAM memory


(all using same 100 MHz core frequency)

DRAM External
core frequency clock frequency Data bus speed
100 MHz 100 MHz 100 Mb/s

Memory
SDR I/O
Cell
SDRAM Buffer
Array

DRAM External
core frequency clock frequency Data bus speed
100 MHz 100 MHz 200 Mb/s

Memory
DDR I/O
Cell
SDRAM Buffer
Array
(DDR1)

2-bit prefetch

DRAM External
core frequency clock frequency Data bus speed
100 MHz 200 MHz 400 Mb/s

Memory
DDR2 I/O
Cell
SDRAM Buffer
Array

4-bit prefetch
DDR2 clocked the memory bus at twice the memory cell speed to trade off an increase in memory
cell latency against an overall increase in memory throughput.

DRAM External
core frequency clock frequency Data bus speed
100 MHz 400 MHz 800 Mb/s

Memory
DDR3 I/O
Cell
SDRAM Buffer
Array

8-bit prefetch
DDR3 clocked the memory bus at four times the memory cell speed to trade off an increase in
memory cell latency against an overall increase in memory throughput.

PC Architecture (TXW102) September 2008 43


Topic 3 - Memory Architecture

Types of DRAM:
DDR3 Speeds
Marketing Naming Single Channel
Data Rate
Name Convention Bandwidth
800 MT/s
DDR3-800 PC3-6400 6.4 GB/s
(400 MHz x 2)
1066 MT/s
DDR3-1066 PC3-8500 8.4 GB/s
(533 MHz x 2)
1333 MT/s
DDR3-1333 PC3-10600 10.6 GB/s
(667 MHz x 2)
1600 MT/s
DDR3-1600 PC3-12800 12.8 GB/s
(800 MHz x 2)

Core 2 Duo
DDR3 DIMM (for desktops)

DDR3
(G)MCH
G35 6.4 to 12.8 GB/s
400/533/667/800 MHz

DDR3 SODIMM (for notebooks)


© 2008 Lenovo

DDR3 Speeds
A DDR3 module is named after its peak bandwidth, which is the maximum amount of data that can
be delivered per second:
• A 400 MHz DDR3 DIMM is called a PC3-6400 DIMM, with 6.4 GB/s bandwidth
(8 bytes [64-bit data path] x 400 MHz x 2 [DDR] = 6400 MB/s or 6.4 GB/s).
• A 533 MHz DDR3 DIMM is called a PC3-8500 DIMM with 8.4 GB/s bandwidth
(8 bytes [64-bit data path] x 533 MHz x 2 [DDR] = 8400 MB/s or 8.4 GB/s).
• A 667 MHz DDR3 DIMM is called a PC3-10600 DIMM with 10.6 GB/s bandwidth
(8 bytes [64-bit data path] x 667 MHz x 2 [DDR] = 10667 MB/s or 10.6 GB/s).
• An 800 MHz DDR3 DIMM is called a PC3-12800 DIMM with 12.8 GB/s bandwidth
(8 bytes [64-bit data path] x 800 x 2 [DDR] = 12800 MB/s or 12.8 GB/s).

DDR3 DIMMs
These types of DDR3 DIMMs are available:
• 240-pin unbuffered DIMMs for desktops
• 240-pin registered (fully buffered) DIMMs for desktops
• 204-pin SODIMMs for notebooks
• Fully Buffered DIMM for servers
• Very Low Profile DIMM for networking

PC Architecture (TXW102) September 2008 44


Topic 3 - Memory Architecture

CAS Latency Dual-channel Latency


Classification Year
Timing Bandwidth (TRCD)

DDR3-800MHz 6-6-6 12.8 GB/s 15.0 ns 2007

DDR3-800MHz 5-5-5 12.8 GB/s 12.5 ns 2007

DDR3-1066MHz 8-8-8 17.1 GB/s 15.0 ns 2007

DDR3-1066MHz 7-7-7 17.1 GB/s 13.1 ns 2007

DDR3-1066MHz 6-6-6 17.1 GB/s 11.2 ns 2007-2008

DDR3-1333MHz 9-9-9 21.3 GB/s 13.5 ns 2008

DDR3-1333MHz 8-8-8 21.3 GB/s 12.0 ns 2008


DDR3-1333MHz 7-7-7 21.3 GB/s 10.5 ns 2008-2009

DDR3-1600MHz 10-10-10 25.6 GB/s 12.5 ns 2009

DDR3-1600MHz 9-9-9 25.6 GB/s 11.3 ns 2009-2010

DDR3-1600MHz 8-8-8 25.6 GB/s 10.0 ns 2009-2010

Four DDR3 slots on a desktop systemboard

PC Architecture (TXW102) September 2008 45


Topic 3 - Memory Architecture

Types of DRAM:
DDR Channels

• Single-channel
- One path to memory for 64-bit data transfer

Memory 6.4 GB/s


Controller PC3-6400 6.4 GB/s bandwidth
Hub

• Dual-channel
- Two paths to memory operate in lock step for 128-bit data transfer
- Double the bandwidth of single channel
- Supported in most of the latest Intel desktop and mobile chipsets
- Called Intel Flex Memory Technology with Intel chipsets
6.4 GB/s
Memory PC3-6400
Controller 12.8 GB/s bandwidth
6.4 GB/s
Hub PC3-6400

© 2008 Lenovo

DDR Channels
DDR1, DDR2, and DDR3 memory are implemented with a single-channel or dual-channels.
The memory controller of the chipset determines the support for the number of channels.
A single channel means there is a single link to the memory DIMM(s). This provides a 64-bit
data path.
A system with dual-channels means there is two independent links to the memory DIMMs.
Both channels can operate in lock step to transfer data simultaneously, so with each bus at 64-
bits wide, this configuration can provide a 128-bit data transfer with twice the bandwidth of a
single channel.

PC Architecture (TXW102) September 2008 46


Topic 3 - Memory Architecture

Intel Flex Memory Technology


Many Intel chipsets support Intel Flex Memory Technology to support single-channel or dual-
channel memory configurations and two modes of operation.
Depending upon how the DIMMs are populated on each system memory channel, a number of
different configurations can exist for DDR memory:
• Single Channel – (1) Only one channel of memory is routed and populated or (2) two channels of
memory are routed, but only one channel is populated. In either case, the DIMM(s) can be
populated in either channel A or channel B.
• Dual Channel Asymmetric – Both channels are populated, but each channel has a different
amount of total memory.
• Dual Channel Symmetric – Both channels are populated where each channel has the same
amount of total memory.

Single-Channel
The system will enter single-channel mode when only one channel of memory is routed and
populated on the systemboard. It will also enter single-channel mode if two channels of memory
are routed, but only one channel is populated. In this configuration, all memory cycles are directed
to a single channel.

D D D D
I I Channel A I I Channel A
M M Total M M Total
M M 1 GB M M 0 GB

Intel D D Intel 0 1 Intel 0 1


Express I I Express Express
Total 1 GB 0 GB 0 GB 0 GB
Chipset M M 1 GB Chipset Chipset
GMCH M M GMCH GMCH
0 1 D D D D
I I Channel B I I Channel B
1 GB 0 GB M M Total M M Total
M M 0 GB M M 1.5 GB
0 1 0 1
0 GB 0 GB 512 MB 1 GB

Single Channel Memory Mode

Dual-Channel Asymmetric
This mode is entered when both memory channels are routed and populated with different amounts
of total memory. This configuration allows addresses to be accessed in series across the channels
starting in channel A until the end of its highest rank, then continue from the bottom of channel B
to the top of the rank. Real-world applications are unlikely to make requests that alternate between
addresses that sit on opposite channels with this memory organization, so in most cases, bandwidth
will be limited to that of a single channel.

PC Architecture (TXW102) September 2008 47


Topic 3 - Memory Architecture

D D D D
I I Channel A I I Channel A
M M Total M M Total
M M 1 GB M M 2 GB

Intel 0 1 Intel 0 1
Express 1 GB 0 GB Express 1 GB 1 GB
Chipset Chipset
GMCH GMCH
D D D D
I I Channel B I I Channel B
M M Total M M Total
M M 512 MB M M 1 GB
0 1 0 1
0 GB 512 MB 0 GB 1 GB

Dual-Channel Asymmetric Memory Mode

Dual-Channel Symmetric
This mode allows the end user to achieve maximum performance on real applications by utilizing
the full 64-bit dual-channel memory interface in parallel across the channels with the aid of Intel
Flex Memory Technology. The key advantage this technology brings is that the end user is only
required to populate both channels with the same amount of total memory to achieve this mode.
The DRAM component technology, device width, device ranks, and page size may vary from one
channel to another.
Addresses are ping-ponged between the channels, and the switch happens after each cache line (64-
byte boundary). If two consecutive cache lines are requested, both may be retrieved simultaneously,
since they are guaranteed to be on opposite channels.

D D D D
I I Channel A I I Channel A
M M Total M M Total
M M 1 GB M M 1 GB

Intel 0 1 Intel 0 1
Express 1 GB 0 MB Express 512 MB 512 MB
Chipset Chipset
GMCH GMCH
D D D D
I I Channel B I I Channel B
M M Total M M Total
M M 1 GB M M 1 GB
0 1 0 1
0 GB 1 GB 0 GB 1 GB

Dual-Channel Symmetric Memory Mode

PC Architecture (TXW102) September 2008 48


Topic 3 - Memory Architecture

Types of DRAM:
FB-DIMMs
• Fully Buffered DIMMs are a different
way to access DDR2 DRAM chips with
a new memory controller
• Serial connection to each DIMM on
the channel
• Interface at each DIMM called
Advanced Memory Buffer Fully Buffered DIMM

Serial DIMM DIMM DIMM DIMM


point-to-point
topology Buffer Buffer Buffer Buffer

FB-DIMM
memory
controller
Fully Buffered Memory DIMMs

© 2008 Lenovo

Fully Buffered DIMMs


FB-DIMM technology replaces the shared parallel memory channel that is used in traditional
DDR1 and DDR2 memory controllers and uses a serial connection to each DIMM on the
channel. The first DIMM in the channel is connected to the memory controller. Subsequent
DIMMs on the channel connect to the one before it. The interface at each DIMM is a buffer
known as the Advanced Memory Buffer (AMB).
As processor speeds increase, memory access must keep up so as to reduce the potential for
bottlenecks in the memory subsystem. With the DDR2 parallel memory bus design, all DIMMs
on a channel are connected to the memory controller. The problem is that as the speed of the
memory channel increases, the number of DIMMs that can be connected decreases due to
electrical loading. One solution is to add more channels, but that requires a significantly more
complex circuit board design and larger board surface area for the additional wires.

Fully Buffered DIMM

DDR2 DIMM

Fully Buffered DIMMs have a different key than DDR2 DIMMs

PC Architecture (TXW102) September 2008 49


Topic 3 - Memory Architecture

This serial interface results in fewer active connections to the DIMMs (approximately 69 per
channel vs. 240 for DDR2 DIMMs) and less complex wiring. Each DIMM is 240 pins. This serial
signaling is relatively similar to PCI Express. The interface between the buffer and DRAM chips is
the same as DDR2 DIMMs. The DRAM chips are also the same as DDR2 DIMMs.

Differential
Pins
Signals
Data Path to DIMMs 10 20
Data Path from DIMMs 14 28
Total High-Speed Signals 48
Power 6
Ground 12
Shared Pins (clocks, calibration, PLL power, test) ~3
Total Pins ~ 69

With this serial point-to-point connectivity, a built-in latency is associated with any memory
request. In addition, the design of FB-DIMM is such that even if the request is fulfilled by the first
DIMM nearest to the memory controller, the address request must still travel the full length of the
serial bus. As a consequence, the more DIMMs per channel, the longer the latency.
FB-DIMMs are not the next generation of DRAM. FB-DIMMs use a new way of accessing the
same DDR2 DRAMs from a new memory controller.
An FB-DIMM memory controller can support up to 6 channels with up to 8 DIMMs per channel
and single or dual-rank DIMMs. Power consumption for a FB-DIMM is 1.2 volts versus 1.8 volts
for DDR2 DIMMs. FB-DIMMs allow greater memory capacity in a system.

Standard
DDR2 interface
DRAM DRAM DRAM
between buffer
and DRAMs
DRAM DRAM DRAM

DRAM DRAM DRAM

DRAM DRAM DRAM


North
10 bits
FB-DIMM Advanced Advanced Advanced
memory Memory Memory Memory
controller Buffer Buffer Buffer
South Up to eight
14 bits DIMMs per
DRAM DRAM DRAM
channel
New serial interface
between controller DRAM DRAM DRAM
and buffer - up to six
channels DRAM DRAM DRAM
Industry
standard DRAM DRAM DRAM
Commodity
DRAMs
Fully Buffered DIMM Architecture

PC Architecture (TXW102) September 2008 50


Topic 3 - Memory Architecture

The Advanced Memory Buffer


An FB-DIMM uses a buffer known as the Advanced Memory Buffer (AMB). The AMB is a
memory interface that connects an array of DRAM chips to the memory controller. The AMB
handles channel and memory requests to and from the local FB-DIMM and forwards requests to
other AMBs in other FB-DIMMs on the channel.
The AMB performs these functions:
• Performs channel initialization to align the clocks and to verify channel connectivity. It is a
synchronization of all the DIMMs on a channel so that they are all communicating at the same
time.
• Supports the forwarding of southbound frames (writing to memory) and northbound frames
(reading from memory) while servicing requests directed to a specific AMB and merging the
return data into the northbound frames.
• Detects errors on the channel and reports them to the memory controller.
• Acts as a DRAM memory buffer for all read, write, and configuration access addressed to a
specific AMB.
• Provides a read and write buffer FIFO.
• Supports an SMBus protocol interface for access to the AMB configuration registers.
• Provides a register interface for the thermal sensor and status indicator.
• Functions as a repeater to extend the maximum length of FB-DIMM links.

Memory Technology Roadmap


DDR3

DDR3
DDR2

FB-DIMM DDR2 667/800

DDR2 400/533

2004 2005 2007

PC Architecture (TXW102) September 2008 51


Topic 3 - Memory Architecture

FB-DIMM Performance
With a less complicated circuit board design and lower power consumption, systems can utilize
memory controllers with more channels. The use of more memory channels results in better
throughput.
The use of a serial connection adds latency to memory access, but the greater throughput offered by
FB-DIMMs results in lower average latency when under load, thereby improving performance.
Higher throughput is reached with the use of more channels. At low throughput levels, the latency
of the serial link is significant. However, because that latency remains constant regardless of the
load, FB-DIMM performance is significantly better than DDR2 as throughput increases.
Memory read latency

2 DDR2-800 channels, 2 DIMMs per channel


(4 total), 2 ranks/channel
4 FB-DIMM 800 MHz channels, 1 DIMM per
channel (4 total), 1 rank/channel

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Theoretical peak memory throughput (GBps)

Latency throughput for Fully Buffered DIMM and DDR2

PC Architecture (TXW102) September 2008 52


Topic 3 - Memory Architecture

Memory Performance:
Memory Capacity Affects Performance

• Memory access speed is slow relative to processor and


L2 cache.
• Adding memory is usually the most cost-effective way to
increase total system performance.
• For 32-bit operating systems (Windows XP and Vista)
- 1 GB to 2 GB yields 21% performance improvement
- 2 GB to 4 GB yields 5% performance improvement

© 2008 Lenovo

Memory Capacity Affects Performance


Increasing main memory capacity increases total application throughput. Not having enough
memory results in the excessive swapping to disk. Increasing memory beyond a reasonable amount
will result in little performance gain, however.

PC Architecture (TXW102) September 2008 53


Topic 3 - Memory Architecture

Memory Performance:
Windows Vista Memory Requirement
• Windows Vista Home Basic
- Requires minimum of 512 MB memory; 1 GB recommended
• Windows Vista Home Premium, Business, Ultimate
- Requires minimum of 1 GB memory; 2 GB recommended
• Windows Vista 64-bit OSes
- Recommend minimum of 4 GB memory
• SuperFetch
- Learns what applications are used most and preloads these into
memory for faster performance
• ReadyBoost
- Vista can use a USB flash memory key
as a read cache to cache memory data

© 2008 Lenovo

Windows Vista Memory Requirement


Following are the memory requirements for Windows Vista:
• Windows Vista Home Basic requires a minimum of 512 MB of main memory, but 1 GB or more
is recommended.
• Windows Vista Home Premium, Business, and Ultimate require a minimum of 1 GB of main
memory, but 2 GB or more is recommended.
• Windows Aero requires a minimum of 1 GB main memory and has other additional
requirements.
• Windows Vista versions that are 64-bit require a minimum of 4 GB of main memory.
Users running Windows Vista may want to increase their minimum amount of main memory. Vista
runs well with 1 GB, but is noticeably more responsive with 2 GB of memory. When running 32-
bit Vista, 4 GB is even better. Be aware installing 4 GB of memory into a 32-bit Vista or XP
system only provides 3 GB of actual memory because the system grabs approximately 1 GB of
memory space for use by the system BIOS, I/O, and other functions.
Windows XP Professional 64-bit and Windows Vista 64-bit will use up all the memory a system
can support. Most PCs max out at 8 GB of memory because of the limitations in the number of
memory slots and the number of memory banks the system can support electrically. Also, 64-bit
operating systems use more memory in general. For a 64-bit OS,
a minimum of 4 GB is recommended.

PC Architecture (TXW102) September 2008 54


Topic 3 - Memory Architecture

When the installed memory is 4 GB, it is expected that all of the 4 GB memory can be used as the
system memory. However, the actual usable memory is less than 4 GB on 32-bit operating systems
that handle up to 4 GB of memory.
Due to an architecture limitation, a number of blocks of memory space need to be allocated under 4
GB memory space: BIOS ROM space, graphics memory space, PCI and PCI Express space,
chipset memory mapped registers, and system management memory. These memory spaces do not
actually use the memory (except BIOS ROM copied to memory and graphics memory for the
integrated memory), but use the address space.
Total usable memory is 3 GB in integrated graphics mode and 2.5 GB in the discrete graphics mode
when 512 MB graphics memory is used.
Intel chipsets have a remapping capability which can push out the system memory space to over 4
GB address space taken by these devices, and this capability is useful when the operating system
can handle over 4 GB memory.

Installed Memory
4GB

Non-usable Memory

3GB

2GB
Usable Memory

1GB

Memory Map

Installed memory vs. usable memory

SuperFetch
Windows SuperFetch understands which applications you use most and preloads these applications
into memory, so your system is more responsive. SuperFetch uses an intelligent prioritization
scheme that understands which applications you use most often, and can even differentiate which
applications you are likely to use at different times (for example, on the weekend versus during the
week), so that your computer is ready to do what you want it to do. Windows Vista can also
prioritize your applications over background tasks, so that when you return to your machine after
leaving it idle, it is still responsive.

ReadyBoost
An optional Windows Vista feature, ReadyBoost, is a read cache that allows Vista to cache
memory data onto flash memory (USB drive, Secure Digital Card, Compact Flash, or other
memory form factor) that will not fit into main memory. Because the flash device could be
removed at any time, unique data cannot be stored on it, and data is encrypted for security reasons.
Microsoft recommends that the USB memory key be about the same size as system memory, but
not larger than 4 GB.

PC Architecture (TXW102) September 2008 55


Topic 3 - Memory Architecture

Memory Errors:
Parity, Nonparity, ECC
Methods of dealing with possible memory errors:
• Parity
- Common prior to 1995; detected single bit errors
• Nonparity
- Common with notebooks and desktops today
- Memory today rarely generates errors
• Error checking and correcting (ECC)
- Detects and corrects 1-bit to 4-bit errors on the fly
- Detects 2-bit to 8-bit errors
- Implemented in server and workstation systems
- Used in Lenovo ThinkStation workstations

ThinkStation S10 (left)


ThinkStation D10 (right)

© 2008 Lenovo

Nonparity and Parity Memory


To prevent data errors in memory, parity or ECC memory can be implemented. Nonparity memory
provides no error detection or correction.
In recent years, maturing manufacturing processes for DRAM devices have resulted in a substantial
improvement in DRAM reliability, thereby eliminating the requirement for parity memory.
Between the 16 Kb DRAM generation of the early 1980s and the current 16 Mb generation,
DRAM has undergone a 10,000-fold improvement in reliability.
Ever since the original IBM PC, many PC-compatible machines have used memory with parity.
The use of parity in these early PCs was critical because of the high soft error rate (SER) of
semiconductor DRAM memories of the time. These memory subsystems could experience a SER
hit as often as every 60 to 70 hours of operation, compared to one SER hit every 12 to 25 years for
memory systems based upon high-quality DRAMs of today.

Implementing Parity
Parity is implemented by adding an additional bit of DRAM memory for every eight bits (one byte)
of DRAM memory. For the 64-bit processors of today, an additional eight bits of memory and a
72-bit data path, increasing cost and board space (DRAM costs increase by 10 percent to 15
percent alone), is required. When the processor writes a byte of information to DRAM, the parity
logic checks for an even number of one bits in the data byte. If even, the parity logic places a zero
on the parity line going to the extra parity bit. If odd, the parity logic places a one on the parity line
going to the extra parity bit.

PC Architecture (TXW102) September 2008 56


Topic 3 - Memory Architecture

When the data is later read back from memory, the parity checking logic checks the nine bits of
information (per byte of data) for an even number of one bits. If it does not contain an even number
of one bits, an error has occurred, and the parity check logic then generates a parity error. Upon
detecting a parity error, the system logic generates a non-maskable interrupt (NMI) request to the
microprocessor. The microprocessor is then forced to jump to the NMI interrupt service routine
(ISR). This ISR, however, is able to do little more than determine that a parity error has occurred.
The NMI ISR reports a DRAM parity error on the screen and locks up the system, causing all the
data in the volatile system memory to be lost.
Average undetected error probability for nonparity memory:
• 32 MB of memory at 100 hours per month, one undetected error every 39 years
• 128 MB of memory at 300 hours per month, one undetected error every eight years
• 512 MB of memory at 600 hours per month, one undetected error every three years
Soft errors are temporary; a bit of data is lost, but the memory cell functions, and rewriting the data
in the cell corrects the error. Soft errors are intermittent errors that occur as a result of the passage
of ionizing radiation through the memory cells of semiconductor devices. The most common source
of this radiation is alpha particles generated as a result of the decay of thorium and uranium, which
are found in trace amounts in the packaging materials of all plastic and ceramic encapsulated
devices. Another source is cosmic radiation such as that from space. A dying star can cause cosmic
radiation projecting a shower of protons and neutrons that can eventually penetrate the atmosphere
and into memory.
A hard error is usually caused by contaminants that lodge themselves in the gate oxide of a DRAM
cell during manufacturing. The higher the altitude, the more numerous the soft errors, because there
are fewer air molecules at higher altitudes to absorb cosmic rays.

Amount of System Failures with System Failures with


Memory Parity in Five years ECC in Five years
16 MB 0.315 0.00016
64 MB 1.26 0.00064
128 MB 2.52 0.00128

The following shows the number of parity bits and ECC bits required for a particular word size (or
data bus transfer):

Word size # of parity bits # of ECC bits


16 bits 2 6
32 bits 4 7
64 bits 8 8
128 bits 16 9

Of all memory errors, single-bit errors occur 98 percent of the time, as compared to 2, 3, or 4-bit
errors. ECC memory can detect all and correct some double-bit errors The most prevalent errors are
single bit soft errors, then single bit hard errors, then multibit hard errors (fixed by Chipkill), then
multibit soft errors.

PC Architecture (TXW102) September 2008 57


Topic 3 - Memory Architecture

ECC: Single-Error Correct


Error checking and correcting (ECC) means the memory subsystem can check for memory errors
and correct the errors dynamically while the system is running without any loss of data or the
system crashing. ECC memory detects and corrects 1-bit errors on the fly. It can also detect 2-bit
errors, although the system will crash if 2-bit errors are detected.
ECC can be implemented in one of three ways: ECC, ECC-P, or EOS. ECC-P and EOS were used
in the mid-1990’s and are not covered in this course.
ECC has the following characteristics:
• The error checking and correcting takes place in the SIMM/DIMM itself in combination with an
ECC-enabled memory controller.
• ECC function causes a 3-percent performance degradation of memory subsystem (not application
throughput)
• It automatically corrects any single-bit errors (98 percent of all memory errors are single bit).
• 2-bit errors will halt the system.
• 3- and 4-bit errors halt the system when found.
• Single-bit errors are logged with optional software (such as IBM Director client code).
• Multiple-bit errors are sometimes logged in NVRAM (can sometimes be viewed while in Setup
Utility).

PC Architecture (TXW102) September 2008 58


Topic 3 - Memory Architecture

ECC: How Single-Error Correct Works


ECC works like parity by generating extra check bits with the data as it is stored in memory.
However, although parity uses only one check bit per byte of data, ECC uses seven check bits for a
32-bit word and eight bits for a 64-bit word. These extra check bits, along with a special hardware
algorithm, allow for single-bit errors to be detected and corrected in real time as the data is read
from memory.
The standard ECC algorithm requires 8 bits of checksum space on a DIMM to protect 64 bits of
data space. Thus, a 64-bit ECC DIMM is typically constructed of 18 DRAM chips. Each DRAM
chip is 4 bits wide, yielding 72 bits of space per 64-bits of data.
In the schematic below, each dark block (the DRAM chip) is 4 bits wide and x million bits deep, to
yield a given number of megabits per DRAM. This is then multiplied by 8 to produce a DIMM
yield (per side) of the same number of megabytes.
For example, if the DRAM chip is 64 megabits deep and 4 bits wide, the DRAM chip yields 256
megabits. The DIMM is double sided so the total yield of the DIMM is 512 megabits, multiplied by
8, or 512 megabytes. The ECC checksum data is stored on the two additional DRAM chips, which
are identical in construction to the data DRAM chips.

4-bit DRAM chip construction with 18 DRAM chips on DIMM


72 data bit locations on DIMM

PC Architecture (TXW102) September 2008 59


Topic 3 - Memory Architecture

ECC: Single DIMM Single-bit ECC Recovery


In a single DIMM single-bit ECC recovery, the best recoverable state is a single-bit error on one
DRAM chip.

0-3 4-7 8-11 12-15 16-19 20-23 24-27 28-31 C0-C3


32-35 36-39 40-43 44-47 48-51 52-55 56-59 60-63 C4-C7

Bit 7 is lost and checksum bits compensate for loss

The above example shows one possible transfer scheme for a single DIMM that is enabled for
ECC. For a 64-bit write, 4 bits are written to each of 16 DRAM chips and 4 checksum bits are
written to each of 2 checksum DRAM chips.
In a single DIMM implementation, every DRAM chip contains more than one bit of a 64-bit
segment.
If only one bit is lost, the memory controller can reconstruct the lost bit by operating a reverse
checksum algorithm on the checksum data, which are physically stored on working DRAM chips.
A hamming code is a numerical calculation which uses an algorithm to convert a very large number
into a very small number. It creates a unique result for a given number of bits, which always
contains the same number of result bits. Thus, a 64-bit data segment can be protected with ECC and
the results of the calculation can be stored in just 8 check bits. For a standard 72/64 hamming code
that provides SEC/DED, anything more than a single-bit error will result in a failure of the reverse
checksum algorithm, and therefore the entire DIMM, causing a catastrophic failure of the system.
The actual order in which the data are written is determined by the memory controller and board
wiring. There are many possible permutations for where each bit could be stored. The example
above is just one possibility. It is inconsequential to the fundamental requirement for ECC
protection where the data bits are actually stored. The requirement for ECC is that, for a 64-bit data
segment, 8 additional bits of check space are required to store the checksum.
On a store, when the values for the 64 bits are known, the memory controller calculates the
checksum for the 64 bits and writes the checksum to memory alongside the data. On a fetch, the
memory controller reads back the 64 data bits along with the 8 check bits. It then recalculates the 8
check bits and compares them to the fetched check bits. The resulting comparison is known as a
syndrome and is used to determine if there are any errors. A standard 72/64 hamming code can
correct any single-bit error and detect any double-bit errors. This is known as Single Error
Correct/Double Error Detect (SEC/DED).
As with the data, the checksum is stored in blocks of four bits on two DRAM chips. Thus, no more
than 4 bits of any 64-bit data segment are ever located together, including the checksum.

PC Architecture (TXW102) September 2008 60


Topic 3 - Memory Architecture

ECC: Multibit Error Correction (Chipkill)


In 1998, IBM introduced memory modules that could detect and correct multibit memory errors
(also known as Chipkill). Chipkill memory provides the capability of correcting real-time, multibit
DRAM errors, such as complete DRAM failures. The standard single-error correct (SEC) error
correction code (ECC) found on most of the PC servers of today may not be enough to overcome
fatal memory faults. The default SEC/ECC in most servers today fixes only a single-bit error.
It is important to understand that Chipkill is always algorithm-based (just like RAID-5). So
software/firmware calculates and determines the correct information via checksum calculations.
There is no physical movement of bits like in Memory ProteXion.
Advanced failure-protection modules become especially important as memory sizes in server
systems increase to 1 GB and beyond. Servers with large amounts of memory and standard error
correction are projected to have the same noncorrectable error rate as previous parity-only systems.
Newer x4/x8/x16 DRAMs have increasingly multibit failures. Besides widths, soaring device
densities make the DRAMs of today more susceptible to bit errors.
Multibit error detection and correction goes by several names, including RAID-M (Redundant
Array of Inexpensive DRAMs for Main Memory), ECC-on-DIMM, Chipkill protection, chip-failure
protection, packet correct, package correct, and symbol correct. Chipkill is a term that indicates
that an entire chip has gone bad. For example, a DIMM with an entire DRAM physically knocked
off is a Chipkill. Usually this term refers to a total electrical failure of a DRAM. However, when
speaking in general terms, Chipkill refers to any DRAM error up to the entire chip. Thus, the term
Chipkill correct or Chipkill protect means fixing any number of bits in a single DRAM package up
to the entire chip.

PC Architecture (TXW102) September 2008 61


Topic 3 - Memory Architecture

Memory Cache

• Memory cache is a fast buffer between processor and memory


that speeds throughput by avoiding access to memory.
• L1 cache (internal to processor, full speed).
• L2 cache (internal to processor, full speed). Main
memory
• Cache hit (typically 90%)
versus cache miss.
90% cache
• The quantity of L1, L2, and L3 cache hit rate
L2 cache
has a big impact on memory
performance (ns speed of cache
90% cache
memory is not significant). hit rate
L1 cache

Processor

© 2008 Lenovo

Memory Cache
Memory cache in the processor enables it to read instructions and data faster than if the processor
had to access system memory. When instructions are first used or data is first read or written, they
are transferred to the cache from main memory. This transfer enables future accessing of the
instructions or data to occur faster. All processors today have L1 and L2 caches, and some have L3
cache.
L2 cache affects system performance. The critical issue is to have some amount of L2 cache; for
example, 256 KB is a typical and reasonable amount for most processors. As L2 cache size
increases, performance will not increase linearly. As more L2 cache is added, performance
increases less. Eventually, the overhead of a large amount of L2 cache could actually slow system
performance.
When the processor performs a memory write, several different scenarios could happen, which are
covered under Student Notes: Cache Write Policy.
L1 and L2 caches run at internal clock speed (e.g., 1.0 ns for 1 GHz processor).

PC Architecture (TXW102) September 2008 62


Topic 3 - Memory Architecture

Cache hit ratio does not increase proportionally with cache size.
Doubling the cache size does not double the cache hit rate.

Cache
Hit
Ratio

256 KB 512 KB 1 MB Cache Size

Performance/Capacity Relationship

PC Architecture (TXW102) September 2008 63


Topic 3 - Memory Architecture

How Memory Cache Works


The cache is loaded from system memory in increments that are each referred to as a cache line (32
to 64 bytes). A reference to any byte contained in a cache line results in the entire line being read in
the cache (if the data was not already in the cache). When the processor gives up control of the
system bus, the cache enters “snoop” mode and monitors all write operations. If memory data is
written to a location in the cache, the corresponding cache line is invalidated.
When the processor performs a memory read, the data address is used to find the data in the cache.
If the data is found (a hit), it is read from the cache, and no external bus cycle is used. If the data is
not found (a miss), an external bus cycle is used to read the data from system memory. If the
address of the missed data is in cacheable address space, the data is stored in the cache, and the
remainder of the cache line is read.

The processor issues data requests faster than the main


system memory can fill them; cache is a buffer so the data
can be accessed faster than main memory could be.
Processor
Cache contains three components, which together hasten
delivery of data.
Cache
• Data cache RAM: fast memory called static RAM
Tag Data (SRAM) stores data or instructions frequently requested
Cache
RAM Controller Cache by the processor.
RAM • Tag RAM: contains the main memory addresses of data
stored in the cache RAM (an index table).
• Cache controller: the mastermind of the cache; it
compares processor requests to the address stored in tag
Main Memory (DRAM) RAM. When a match is found, data is returned to the
processor from cache RAM. If there is no match, the
request is passed on to main system memory.

PC Architecture (TXW102) September 2008 64


Topic 3 - Memory Architecture

Cache Write Policy


Cache write policy is the method by which writes (not reads) are written from the processor out to
cache and memory.
• 30% of all memory operations are writes (versus reads)
Applies to
• Two types of policy: write-through or write-back Writes Only

1. Write-through
– Processor writes data through the cache to memory
(also writes to cache if the cache line is already there). Main
memory
• Caching effect is restricted to memory reading only.
2. Write-back
– Processor writes to cache and proceeds to the next L2 cache
instruction. The cache holds the write-back data in the
cache and later copies it to main memory (processor
idle time).
L1 cache
– Higher performance than write-through but more costly.
– Write-back has 10% faster application performance than write-through. Processor

The cache line is typically 32 to 64 bytes and involves four transfers to write (or read) the cache
line. The cache controller must be engineered so that busmasters do not read data in memory if the
data is still in the cache.
For the L1 and L2 cache:
• Write-through and write-back cache for reads
– If read miss: it reloads the new data after it gets it from the next level.
– If read hits: it reads the information from cache.
• Write-through cache for writes
– If write hit: it updates cache line and writes to memory controller.
– If write miss: does not capture; it writes to memory controller.
• Write-back cache for writes
– If write hit: it updates the cache line and returns ready to the processor and will eventually
write to memory if required.
– If write miss: it does not capture; it writes to the memory controller.
Thus, for write misses, neither type of cache captures the data; it is written to the memory
controller.

PC Architecture (TXW102) September 2008 65


Topic 3 - Memory Architecture

Direct Mapped Cache Organization


In direct mapped cache organization, parts of memory are directly mapped to a single cache line.
The following are some distinguishing characteristics of direct mapped cache organization.

Memory

0 xx MB

L2 Cache Cache Organization


Direct Mapped
Set Associative
xx KB

• Memory is logically divided into blocks that map directly with a line-in cache (it can be stored
only in that particular line).
• If the application asks for information from two areas in the same block that compete for the
same line, the processor must thrash the cache and replace it with a new line.
– For example, if 256 KB L2 cache is used with 16 MB of main memory, about 64 blocks
compete for the same line in L2 cache because 256 KB by 64 = 16 MB.
• Direct-mapped cache is easy to design at a low cost, and it provides satisfactory DOS
performance for notebook and desktop operations systems.

PC Architecture (TXW102) September 2008 66


Topic 3 - Memory Architecture

Set Associative Cache Organization


In set associative cache organization, each line in memory can be stored in one of several cache
lines in cache. The following are some distinguishing characteristics of set associative cache
organization.

Memory

0 xx MB

Cache Organization
L2 Cache
Direct Mapped
Set Associative
xx KB
0
• Each block in memory can be stored in different locations in cache.
• There are several versions of set associative cache organization:
– Two-way set associative gives each block in memory two cache line locations.
– Four-way set associative gives each block in memory four cache line locations.
– Fully associative lets each block in memory be stored anywhere in cache.
• Advantage: the most recently accessed information is usually in cache.
• Disadvantage: search is slow since the processor must search all of cache.
• It is complex to design and is higher in cost, but normally offers better performance
• The processor must now look in two, four, or all lines in L2 cache versus one line in
direct mapped.
• Multitasking operating systems alternate between multiple tasks with each task calling for a
different line in the same block.

PC Architecture (TXW102) September 2008 67


Topic 3 - Memory Architecture

Memory Cache:
Cache Evolution

• The wider the disparity between processor and memory speed,


the more levels of cache are used.
• L3 cache is used with server-based processors.

Performance CPU speed

L3

L2
L2
L1
L1
L1 cache Memory speed

Time

© 2008 Lenovo

Cache Evolution
Processor speed is evolving faster than memory speed. To avoid slow access to memory, levels of
cache should be implemented.
L1 and L2 caches are now standard in processors. The wider the disparity between the processor
and memory speed, the more levels of cache that are needed.
L3 cache is used with server-based processors.

PC Architecture (TXW102) September 2008 68


Topic 3 - Memory Architecture

Memory Cache:
Static Random Access Memory (SRAM)

• Can access memory at all


times, because static RAM
chips do not need refreshing
• Used for L1, L2, and L3 cache
in modern processors
• SRAM speeds
- 2 to 20 nanoseconds (ns)
- On die requires 0 wait states.

(Red box) L2 cache SRAM


of Intel Pentium M

© 2008 Lenovo

SRAM
Static random access memory (SRAM) is a type of memory that can be accessed at all times,
because static RAM chips do not need refreshing (in contrast to dynamic random access memory
[DRAM]). SRAM memory provides faster access, and is more expensive than DRAM. It is used
primarily for processor caches.
There are manufacturing differences for L1 and L2 caches that are placed on the processor die, as
opposed to being placed on an external memory die as in some L3 designs. These differences relate
more to power and space concerns than any functional differences. The SRAM on the die of the
processor must provide data in zero wait state (i.e., it must function with a cycle time that is equal
to the frequency of the processor).
All processors today have caches that use SRAM. If the L1, L2, and L3 cache is on the die of the
processor (meaning the same physical chip), this cache must be fast enough to provide zero wait
states. For example, a processor at 1 GHz needs 1.0 ns to have zero wait states.

PC Architecture (TXW102) September 2008 69


Topic 3 - Memory Architecture

How SRAM Works


SRAM cells store bit values 0 and 1 by forcing a transistor switch to be either off (0) or on (1).
Once an SRAM bit is stored, the switch is held in the 0 or 1 state by the operation of the circuit. It
is extremely rare for the switch to be moved to the opposite state accidentally.
In comparison, a DRAM cell stores its 0 or 1 state by the amount of electrical charge stored on a
component called a capacitor. A common analogy depicts the capacitor as a bucket and the charge
as water in the bucket. When the bucket is full it represents a 1, when empty, a 0.
The problem with DRAM is that the bucket has a hole in it; the consequence is that a full bucket,
after enough time, becomes an empty bucket. That is, the 1 stored there gradually becomes a 0. To
overcome this drawback, DRAM has to be refreshed. Every now and then, the contents of each
location in memory have to be checked. If a location is set to 0, it is left alone; if set to 1, then the
capacitor is “topped up.” This refresh activity has to be performed regularly – often enough to
guarantee that sufficient charge remains to make sure that a 1 is always read as a 1.
Note: DRAM chips use only one transistor per bit; SRAM chips use four to six transistors per bit. It
is easier to pack transistors more densely onto memory components than onto logic parts.

PC Architecture (TXW102) September 2008 70


Topic 3 - Memory Architecture

Summary:
Memory Architecture

• Memory subsystem consists of L1 and L2 cache (SRAM) within the processor


and main memory (DRAM).
• Memory is available in different packaging, such as DIMMs.
• Notebook and desktop systems are migrating from DDR2 to DDR3.
• Increasing the amount of memory increases performance.
• Notebook and desktop systems use nonparity memory.
• The quantity of L1 and L2 cache has a big impact on memory performance.

© 2008 Lenovo

Magnetic RAM
A new kind of computer memory chip is being developed
by IBM’s Almalden Labs. The chip has the potential to
eventually replace all RAM technologies. Magnetic Random
Access Memory (MRAM) is a microscopic memory-cell
technology that consumes little power and is non-volatile
memory meaning it retains data when power is shut off.
Resolving the issue of volatility has been the single largest
issue facing the DRAM industry over the past ten years.
This positions MRAM as a potential replacement for
Ferro-Electric Random Access Memory (FeRAM),
Static RAM (SRAM), and potentially for high-performance,
low-power, embedded applications such as those addressed Magnetic RAM chip
by Electrically Erasable Read-Only Memory (EEPROM) today. MRAM is much faster than Flash
memory. The current speed projection for MRAM is about six times faster than today’s DRAM
memories with initial chip density projected to be one megabyte. MRAM is in early development
and is classified as an alternative technology. MRAM depends on magnetic polarity to store data,
rather than electricity like DRAM chips, and stores bits in magnetic layers rather than charges,
yielding non-volatile solid-state storage with speeds comparable to SRAM. Initial production will
use .25 micron technology with limited production expected to begin in 2005.

PC Architecture (TXW102) September 2008 71


Topic 3 - Memory Architecture

Review Quiz

Objective 1

1. Which memory is located within the processor?


a. L2 cache
b. Main memory
c. L4 cache
d. CompactFlash

Objective 2

2. What kind of memory needs to be refreshed constantly?


a. Flash memory
b. DRAM
c. Virtual memory
d. SRAM

3. What memory technology is used for data storage as an alternative to disk-based technology?
a. L2 cache
b. L5 cache
c. Intel Flex Memory Technology
d. Flash memory

Objective 3

4. What does double data rate (DDR) refer to with DDR2 and DDR3 memory?
a. Data is transferred on both rising and falling edges of clock.
b. DDR DIMMs must be installed in pairs.
c. DDR memory have read speeds that differ from their write speeds.
d. Multibit error correction.

5. The naming convention of PC3-6400 indicates what type of memory?


a. SDRAM
b. DDR2
c. DDR3
d. Rambus RDRAM

PC Architecture (TXW102) September 2008 72


Topic 3 - Memory Architecture

Objective 6

6. The biggest impact on memory performance is what?


a. The ns speed of the memory
b. L2 cache and its implementation
c. The use of parity or ECC memory
d. The use of SIMMs or DIMMs

7. The typical memory cache hit rate is what percentage?


a. 60%
b. 75%
c. 90%
d. 99%

8. L2 cache uses what kind of memory?


a. Flash memory
b. SRAM
c. DRAM
d. Tag RAM

PC Architecture (TXW102) September 2008 73


Topic 3 - Memory Architecture

Answer Key
1. A
2. B
3. D
4. A
5. C
6. B
7. C
8. B

PC Architecture (TXW102) September 2008 74


Topic 4 - Bus Architecture

PC Architecture (TXW102)
Topic 4:
Bus Architecture

© 2008 Lenovo

PC Architecture (TXW102) September 2008 1


Topic 4 - Bus Architecture

Objectives:
Bus Architecture

Upon completion of this topic, you will be able to:

1. Recognize historical bus architecture names and categories


2. Describe features of the PCI bus architecture including Mini PCI
3. Define important features of PCI Express including transfer rates,
implementation differences, and Mini PCI Express
4. Describe ExpressCard features including the two types of modules
and its implementation
5. Differentiate USB, Wireless USB, and 1394 FireWire
6. Identify PC Card implementations including CardBus

7. List the common chipsets for Intel's main


desktop and notebook processors including
the various I/O Controller Hub implementations

© 2008 Lenovo

PC Architecture (TXW102) September 2008 2


Topic 4 - Bus Architecture

Historical Bus Architectures

Basic expansion bus:


- ISA (or AT bus) 1984
Legacy advanced expansion buses:
- Micro Channel 1987
- EISA 1989
Advanced bus:
- PCI 1993
16-Bit ISA Adapter
- PCI-X 1.0 (PCI-X 66 and PCI-X 133) 2001
- PCI-X 2.0 (PCI-X 266 and PCI-X 533) 2003
- PCI Express 1.0 2004
- PCI Express 2.0 2007
Point-to-point link:
- Accelerated Graphics Port (AGP) 1997
- Hub Interface 2000
- Direct Media Interface (DMI) 2004 32-Bit PCI Adapter

© 2008 Lenovo

Historical Bus Architectures


Bus architecture also goes by the terms expansion bus, expansion slots, slots, I/O bus, I/O system,
or channel bus.
Micro Channel and EISA had higher performance and advanced features over the original ISA or
AT bus. Micro Channel and EISA were replaced by PCI and PCI-X.
PCI enhances performance for I/O-intensive operations such as storage and communication devices
in servers. PCI offered multiple expansion slots in desktops.
AGP was a point-to-point link for the graphics controller introduced in 1997. AGP was not a bus,
in that it was not shared by other devices.
Hub Interface is a dedicated point-to-point link used in various Intel chipsets beginning with the
8xx family. Hub Interface connects the Memory Controller Hub with the I/O Controller Hub.
PCI Express is the next generation PCI bus that provides multiple serial links; PCI Express 1.0 first
appeared in desktop systems in 2004, then PCI Express 2.0 followed in 2007.
Direct Media Interface (DMI) is a dedicated point-to-point link used in various Intel chipsets such
as the 9xx family. DMI connects the Memory Controller Hub to the I/O Controller Hub.
The need for fast buses and links will accelerate over time, as I/O devices become more
sophisticated and application environments grow more demanding. Lenovo products incorporate
industry-standard buses into current and future products.

PC Architecture (TXW102) September 2008 3


Topic 4 - Bus Architecture

PCI Bus:
PCI Overview

• Industry-standard Peripheral
Component Interconnect (PCI)
bus used in most PCs since 1993
Processor +
• PCI slots still used in desktops PCI Express
L1/L2 cache
PCIe
• Usually implemented as 32-bit x16 slot Memory
MCH or
33 MHz bus PCI Express slots GMCH
Host Bridge PCIe Mobile
• Being replaced by PCI Express docking

PCIe PCIe
ExpressCard
PCI 2.0 I/O USB 2.0
Controller PCIe
Hub Mini Card
(ICH) USB 2.0
PCIe Integrated
PCI 2.0 slots systemboard
devices
Super I/O
Firmware
Hub

© 2008 Lenovo

PCI Overview
Peripheral Component Interconnect (PCI) is an industry standard bus architecture defined by the
PCI Special Interest Group (www.pcisig.com) used since 1993 in virtually all PCs, from notebooks
to servers. As a bus architecture, PCI defines how subsystems communicate with other subsystems.
Subsystems may be in integrated on a motherboard or via slots inside desktops or in notebook
docking stations. More subsystems are moving to integration in chips (such as the I/O Controller
Hub) or to dedicated buses/links (such as PCI Express), so PCI’s importance is not as critical as
previously.

PCI Technical Information


The PCI bus is a synchronous bus (i.e., every event must occur at a particular clock tick or edge).
PCI uses a multiplexed address and data bus that reduces pin count and component size; thus, no
separate address bus is required. A PCI system could have peripherals (disk controller, LAN
controller) on the systemboard and/or have open slots or connectors. Whether the peripheral is on
the systemboard or an adapter, that peripheral uses the PCI command set to transfer data.
PCI supports busmasters, which are subsystems that can control the bus to transfer data
independently of the processor or DMA controller. The processor can operate independently when
a PCI peripheral is active because of its buffered design.
Whereas Micro Channel uses cycles for the arbitration process, PCI is more efficient in that it
allows the arbitration for the next access to occur while the current access is still in progress.

PC Architecture (TXW102) September 2008 4


Topic 4 - Bus Architecture

Whereas ISA and Micro Channel support DMA, there is no DMA function with PCI. However,
busmasters can be devices on a PCI bus. The use of a busmaster device releases the processor from
the data transfer; this fact is important in multitasking operating systems. PCI supports automatic
configuration; thus there are no jumpers or DIP switches. All PCI adapters must include 256 bytes
to store configuration information.

PCI Data Flow


The PCI bus does most transfers as burst reads and writes in order to optimize performance. PCI
defines burst mode in the same way Micro Channel defines streaming mode. In a PCI burst mode, a
transfer starts with one address and then performs multiple data transfers to or from consecutive
address locations.

Multiplexed Address and Data Bus

Address Data Data Data Data Data

The processor can write data to PCI peripherals, and the PCI bridge/controller can store the data
immediately in its buffer. This process lets the processor quickly go to the next operation rather
than waiting for it to complete the transfer. The buffer then feeds the data to the PCI peripheral in
efficient bursts. The bridge can buffer a non-burst write from the processor and present it to the PCI
bus as a burst write. A combination of buffering and bursting can essentially mask the delays
coming from bus isolation.
PCI burst transfers have an indefinite length, whereas most other architectures have a limited, fixed
burst length. PCI bursts continue until the master or target requests the transfer to end, or until a
higher priority peripheral needs to use the bus. PCI requires a fairness arbitration algorithm;
therefore, any device can preempt a long burst transfer.
Control signals are mixed with data transfers on the bus (AGP can do sideband signaling to
transmit control signals on separate lines from the data channel).

PCI Specifications
PCI, a local bus specification (an open standard), was released by Intel in May 1993 as PCI 2.0.
Although Intel developed PCI, it has formed an industry group called the PCI Special Interest
Group to control the specification, and several PCI patents are royalty free.
The PCI Bus Power Management Interface Specification establishes a standard set of PCI
peripheral power management hardware interfaces and behavioral policies and thereby enables an
operating system to intelligently manage the power of PCI functions and buses. PCs should be
labeled ACPI-compliant. Such states include PCI power management compliance. PCI power
management is intended to be an independent method of supporting power management on the PCI
bus (but is compatible with ACPI). Version 1.0 was ratified in 1997, and version 1.1 was ratified in
1998. Version 1.1 added a Vaux as a pin on the connector to be used by the I/O device. When
enabled by software, Vaux powers wake-up logic to wake up the system.

PC Architecture (TXW102) September 2008 5


Topic 4 - Bus Architecture

PCI Parity
Data and address parity is required for all PCI peripherals (in other architectures it is either
impossible or optional). The data source – master on a write and target on a read – always generates
parity. Parity checking (for the device receiving the data) is optional for systemboard devices and
required for adapters except devices that cannot cause a data integrity problem like sound cards.
PCI has a single parity bit that covers the address/data bus plus the command or byte enables.
Micro Channel supports parity per byte on both the address and data bus.
The bus isolation of PCI allows faster processors to use the bus. PCI adapter vendors can design
one card to operate with Intel, PowerPC, Transmeta and other brands.

PCI Data Transfer Rates


• The PCI bus does transfers as burst reads and writes
• Reads and writes have different rates as idle phases and/or bus turnaround phases are included

Burst Mode

Addr Data Data Data Data Data


4 Bytes 4 Bytes 4 Bytes 4 Bytes 4 Bytes
Indefinite length

Data path Speed Transfer Rate • Devices never achieve


32-bit 33 MHz 132 MB/s full transfer rate
32-bit 66 MHz 264 MB/s
64-bit 33 MHz 264 MB/s
64-bit 66 MHz 528 MB/s

32 byte read 96 MB/s


Examples of Burst 32 byte write 105.6 MB/s
Mode Transfer Rates 256 byte read 126 MB/s
(32-bit at 33 MHz): 256 byte write 128 MB/s
4 KB of data 131.743 MB/s

PC Architecture (TXW102) September 2008 6


Topic 4 - Bus Architecture

PCI Writes (Non-burst)


A basic data transfer operation on the PCI bus is called a PCI transaction, which usually involves
request, arbitration, grant, address, turnaround and data transfer phases. The PCI bus is a
multiplexed address and data bus, meaning that the address and data lines are physically the same
wires. Thus, fewer signal wires are required, resulting in a simpler, smaller connector. The
downside is that PCI transactions must contain a turnaround phase to allow the address lines to be
switched from address mode into data mode.
PCI agents that initiate a bus transfer are called initiators, whereas the responding agents are called
targets. All PCI operations are referenced from memory. For example, a PCI read operation is a
PCI agent reading from system memory. A PCI write is a PCI agent writing to system memory.
The language of PCI defines the initiator as the PCI busmaster adapter that initiates the data
transfer (for example, a LAN adapter or SCSI adapter) and the target as the PCI device that is being
accessed. The target is usually the PCI bridge device or memory controller.
A write for PCI at 33 MHz on a 32-bit bus (non-burst) would consist of the following:
• Writes consist of an address phase (30.3ns), a data phase (30.3ns or higher--can be extended by a
slow master or target), and an idle phase (30.3ns).
• 44 MB/s for writes (best case: no wait states, fast address decode by target, different masters for
writes).

30.3ns 30.3ns + 30.3ns

Address Data Idle 30.3ns = 1 second divided by 33 MHz

The following example shows the same master writing to different targets (or same target with
nonsequential address); thus, an idle period is not needed.
• 66 MB/s for writes (best case, no wait states, fast address decode by target):

30.3ns 30.3ns + 30.3ns 30.3ns + 30.3ns 30.3ns +

Address Data Address Data Address Data

Target 1 Target 2 Target 3

PC Architecture (TXW102) September 2008 7


Topic 4 - Bus Architecture

PCI Reads (Non-burst)


A write for PCI at 33 MHz on a 32-bit bus (non-burst) would consist of the following:
• Reads consist of an address phase (30.3ns), a turnaround clock cycle (30.3ns), data phase (30.3ns
or longer for a slow master or target), and an idle phase (30.3ns)
• Rates (non-burst)
– Best case: no wait states, fast address decode by target
– The bus turnaround cycle is to allow the master to get off the bus before the target starts
driving the bus

Bus
Address Data Idle
Turnaround
30.3ns 30.3ns 30.3ns + 30.3ns

With a 33 MHz PCI bus, the maximum throughput is 132 MB/s. However, 132 MB/s is a burst data
transfer rate; in reality, transfers are always under 132 MB/s and depend on amount of data, write
or read, and wait states.

PCI Bus Performance


It is rare that the PCI bus is capable of sustaining the theoretical throughput rate of 132 MB/s. In
most systems, the sustainable PCI throughout is only about 30 to 60 MB/s.
The PCI adapter, the adapter device driver, and the system PCI chipset all limit the maximum
sustainable throughput. The device driver and adapter firmware play a role in how the adapter is
programmed to transfer data over the PCI bus. In most cases, 25- to 50-percent bus efficiency is
typical.

PC Architecture (TXW102) September 2008 8


Topic 4 - Bus Architecture

PCI Bus:
PCI 2.1, 2.2, and 2.3

PCI 2.1 (August 1995) Processor


• Bus speed 0 to 66 MHz PCI Bridge/
(versus 20 to 33 MHz) Memory Controller
32-Bit PCI A
• Delayed transactions
PCI-to-PCI Bridge ISA Bridge

32-Bit PCI B
PCI 2.2 (January 1999)
I/O Slots
• PCI hot plug capability
• PCI power management

PCI 2.3 (February 2003)


• Removed support for 5 volt
adapters
• Officially added low profile
adapters (smaller size PCI adapters)

Low Profile PCI Adapter


(Low Profile bracket on left;
Full height bracket on right)
© 2008 Lenovo

PCI 2.1
PCI 2.1 was introduced in August 1995 to supersede PCI 2.0. One key new feature is the support of
bus speeds from 0 to 66 MHz (from 20 to 33 MHz); 0 MHz is necessary on notebooks in order to
improve battery life and support for delayed transactions.
PCI 2.1 supports a bus that is a compatible superset of PCI 2.0 and is defined to operate up to a
maximum clock speed of 66 MHz. The 66 MHz PCI bus is intended to be used by low-latency,
high-bandwidth bridges and peripherals. Systems may augment the 66 MHz PCI bus with a
separate 33 MHz PCI bus in order to handle lower speed peripherals.
Differences between 33 MHz PCI and 66 MHz PCI are minimal. Both share the same protocol,
signal definitions, and connector layout. To identify 66 MHz devices, one static signal is added by
redefining an existing ground pin, and one bit is added to the configuration status register. Bus
drivers for the 66 MHz PCI bus meet the same DC characteristics and AC drive point limits as 33
MHz PCI bus drivers; however, 66 MHz PCI requires faster timing parameters and redefined
measurement conditions. As a result, 66 MHz PCI buses may support smaller loading and trace
lengths.
A 66 MHz PCI device operates as a 33 MHz PCI device when it is connected to a 33 MHz PCI
bus. Similarly, if any 33 MHz PCI devices are connected to a 66 MHz PCI bus, the 66 MHz PCI
bus will operate as a 33 MHz PCI bus (forward and backward compatibility is maintained). The
programming models for 66 MHz PCI and 33 MHz PCI are the same, including configuration
headers and class types. Agents and bridges include a 66 MHz PCI status bit. A 66 MHz PCI
component/adapter must use the 3.3-volt, and not the 5.0 volt, signaling environment.

PC Architecture (TXW102) September 2008 9


Topic 4 - Bus Architecture

PCI 2.2
PCI 2.2 was ratified in January 1999 and included the following: support for hot pluggability,
updates to PCI power management, message signaled interrupts (MSI) (a requirement for 3.3-volt
signaling for PCI connectors), and a roll-up of engineering change notices (ECNs) and errata since
2.1.
Hot pluggability is the ability to insert and remove PCI adapter cards without powering off the
system. This capability allows for several implementations, including hot replace (replacing adapter
cards in "hot systems"), hot upgrade (upgrading existing adapter cards with new versions of cards
and drivers), and hot expansion (adding previously uninstalled cards and associated driver software
into the system).
PCI power management addresses the issues of standardized power management capabilities and
energy conservation on the PCI bus. The latest power management specification is aligned with the
ACPI specification, enabling PCI devices – both systemboard and add-in – to participate in
platform-wide operating system-directed power management.
ECNs adopted since version 2.1 of the PCI specification cover a variety of areas, including
modifications to accommodate PCI power management and mechanical issues such as bracket
mounting, tolerances, and riser connectors.
PCI 2.2 requires any PCI connector to support 3.3-volt power on the pins labeled +3.3v (as
opposed to 5v). As this logic becomes more widespread in the industry, it allows devices with this
logic to function. While a PCI adapter may still be a 5-volt adapter (and keyed for a 5-volt
connector), a 5-volt adapter may contain 3.3-volt logic on it. Providing 3.3-volt power on the
connector allows the use of 3.3-volt logic without the need for a +5v to +3.3v power converter. The
purpose of requiring 3.3-volt power on the connector is to enable the use of 3.3-volt logic and
encourage the development of Universal cards (keyed for both 5-volt and 3.3-volt signaling).
Message Signaled Interrupts (MSI) is a PCI interrupt signaling mechanism introduced as an
optional feature in PCI 2.2. Typically, a PCI device uses interrupt pins to request a service and is
limited to four interrupt pins. Thus, four IRQs are allowed for PCI devices in a system. While PCI
devices typically share a single IRQ, better performance can be obtained by not sharing IRQs. With
MSI, devices do not use interrupt pins to request service; rather, they write a system-specified
message to a system-specified address (a PCI memory write transaction). MSI will allow each
device to support up to 32 unique interrupts. In reality, having one interrupt per device is more
practical. Intel will probably support one unique interrupt per device and/or the OS will support
only one. The x86 architecture limits a system to 256 interrupts. (MSI theoretically supports
65,536.) The advantage is that systems can support a large number of interrupts without increasing
interrupt controller pin count, thereby reducing system cost and interrupt routing complexity. MSI
can increase performance in that each device can have its own interrupt.

PC Architecture (TXW102) September 2008 10


Topic 4 - Bus Architecture

PCI 2.3
PCI 2.3 was approved in February 2003. PCI 2.3 is an evolutionary change to the PCI local bus
specification. PCI 2.3 makes a significant step in migrating the PCI bus from original 5.0 volt
signaling to a 3.3 volt signaling bus. Revision 2.3 supports the 5V and 3.3V keyed systemboard
connectors (as did revision 2.2), but revision 2.3 supports only the 3.3V and Universal keyed add-
in cards. The 5V keyed add-in card is not supported in revision 2.3. PCI 66, PCI-X, Mini PCI, and
Low Profile PCI support only 3.3 volt signaling on 3.3V keyed systemboard connectors and 3.3V
and Universal keyed add-in cards.
PCI 2.3 also incorporated the following changes:
• Support added for the System Management Bus (SMBus) by adding a two-wire management
interface to the PCI connector
• New reset timing parameter added
• New bit fields added to registers in configuration address space
• Officially added the low profile PCI adapter card form factor, although it had been used since
2000
• Revision 2.3 also incorporates other ECNs and approved errata. Compliance to Revision 2.3 will
be required no later than January 1, 2004.

PC Architecture (TXW102) September 2008 11


Topic 4 - Bus Architecture

PCI Cycle Times


PCI Clock Speed Address or Data Cycle Propagation Delay Loads
25 MHz 40ns 40ns Up to 20
33 MHz 30.3ns 30.3ns 10 loads
66 MHz 15ns 15ns 4 loads

Function 33 MHz 66 MHz

Internal delay of chips; rising clock edge on chip until valid signal 11ns 6ns

Propagation delay, time it takes master to control the bus 10ns 5ns
Input setup time 7ns 3ns
Clock skew 2ns 1ns
Total time: 30.3ns 15ns

PCI allows ten loads (at 33 MHz). One load goes to each systemboard peripheral, the PCI
bridge/controller, and an expansion bus bridge; two loads go to each adapter card slot. PCI is
therefore said to support a maximum of ten peripherals.
All subsystems of the PCI bus run at a specific clock rate; i.e., PCI cannot make a change in speed
based on the subsystem doing the transfer.
The PCI bus can be clocked at any frequency under 66 MHz. It does not have to be an exact
multiplier of the processor clock.
Doubling the bus to 64 bits is expensive in that it requires more pins on system chipsets, more
traces on systemboards, and more complex PCI adapters; a 66 MHz-speed is more economical.

Delayed Transactions
Delayed transactions is a new feature in PCI 2.1. It is primarily used in PCI-to-PCI bridges but
could also be implemented in the PCI host bridge controller. The purpose is to have predictable
time delays when transferring between slow devices, because in PCI 2.0 the length of time was
unpredictable. The result is that performance with PCI-to-PCI bridges is improved.
For example, when a read is initiated by the processor to a device on a secondary PCI bus, delayed
transaction allows the PCI-to-PCI bridge to take the read command and execute it on its own. (If
the secondary bus is busy when the processor does the read.) The PCI-to-PCI bridge will hold the
data in the buffer when the read is finally complete. When the processor executes a retry, it can get
the data from the buffer of the PCI-to-PCI bridge (instead of going across the secondary bus, which
may be busy again).
Delayed transactions also apply to writes. If the write is to prefetchable memory, the processor can
continue to other tasks while the PCI bridge eventually writes the data at a later point. Prefetchable
memory is defined as memory that is readable prior to the need for it. If it is not needed, it is
discarded.

PC Architecture (TXW102) September 2008 12


Topic 4 - Bus Architecture

Low Profile PCI


In February 2000, Low Profile PCI was introduced by the PCI SIG, although it was not an official
standard until PCI 2.3 was approved in February 2003. Low Profile PCI is a specification for
smaller-sized PCI adapter cards. These cards provide a lower cost solution and a reduction in
overall system height and length, with greater flexibility in system layouts.
Low Profile PCI has the same attributes as PCI 2.2, with the same electrical characteristics,
functionality, PC signals, signal protocol, configuration definitions, and software drivers. These
cards eliminate the need for a riser card in many thin systems.
There are two defined card lengths. MD1 is the shortest 32-bit card length at 119.91 mm (4.721
inches). MD2 is 167.64 mm (6.600 inches). The maximum height is 64.4 mm (2.536 inches). MD1
cards provide only 4.8 square inches of board real estate.

MD2 length 167.64 mm (6.600")


Maximum
height of
64.4 mm
Minimum (2.536”)
height of
MD1 length 119.91 mm (4.721")
24 mm
(.954”)

Dimensions of Low Profile PCI Adapter

Low Profile PCI Adapter

PC Architecture (TXW102) September 2008 13


Topic 4 - Bus Architecture

Tool-less PCI Slots


Many Lenovo ThinkCentre systems use tool-less PCI slots which means a tool is not needed to turn
a screw to install or remove a PCI adapter.

Blue handle allows easy install


or removal of PCI adapters

Tool-less PCI Slots in a Lenovo ThinkCentre Desktop

Lenovo ThinkCentre systemboard with two PCI 2.3 32-bit slots

PC Architecture (TXW102) September 2008 14


Topic 4 - Bus Architecture

PCI Bus:
PCI Connectors

• Connectors can be either 32-bit or 64-bit.


• All PCI adapters are compatible.

32-bit portion of connector

5 volt, 32-bit, 33 MHz connector

5 volt, 64-bit, 33 MHz connector

Key = 5v slot
64-bit portion of connector
3.3 volt, 32-bit, 66 MHz connector

3.3 or 1.5 volt, 64-bit, 66 or 100 or 133 MHz connector


PCI-X only
Key = 3.3v slot
64-bit portion of connector

© 2008 Lenovo

PCI Connectors
PCI adapters use connectors (also called slots) which are either 32-bit or 64-bit for the data transfer
path. The 64-bit connector is a superset of the 32-bit connector. 32-bit PCI adapters can be used in
a 64-bit slots and vice versa. If a 64-bit adapter is plugged into a 32-bit slot, the adapter only uses
32-bit transfers. The connector is keyed for 5-volt or 3.3-volt adapters; PCI-X 2.0 adapters can use
1.5-volt signaling in a 3.3-volt connector. Universal adapters could be designed to differentiate
between the two voltages and be keyed to fit either type of slot connector. Every active signal on
the PCI bus is either next to or opposite a power supply or ground signal in order to minimize stray
radiation.

Adapter 100/133 MHz, 3.3 66 MHz, 66 MHz, 33 MHz, 33 MHz, Universal


V, 64-bit 3.3 V, 64-bit 3.3 V, 32-bit 5.0 V, 64-bit 5.0 V, 32-bit 5.0 V or 3.3 V
Slot Adapter Adapter Adapter Adapter Adapter Adapter

100/133 MHz, 3.3 V,


Support Support Support No No Support
64-bit Slot

66 MHz, 3.3 V,
Support Support Support No No Support
64-bit Slot

66 MHz, 3.3 V,
Support Support Support No No Support
32-bit Slot

33 MHz, 5.0 V,
No No No Support Support Support
64-bit Slot

33 MHz, 5.0 V,
No No No Support Support Support
32-bit Slot

PC Architecture (TXW102) September 2008 15


Topic 4 - Bus Architecture

In the late 1990s, it was typical of systems to have "shared" I/O slots, which can be used for either
of two types of expansion boards. The shared slots illustrated above will accept either a PCI or an
ISA/EISA/Micro Channel card.

PCI connector PCI expansion board

ISA/EISA/Micro Channel
Mounting ISA/EISA/MC expansion board
bracket connector

Computer's back panel

For PCI adapters, there is a difference between voltages regarding signaling levels and power
supply levels. The signaling level for PCI is for data transfer signaling. PCI connectors that are
keyed for 3.3 volts, 5 volts, or universal refer to this signaling level. Power supply levels are 3.3
volts and 5 volts and refer to powering the logic chips on the adapter. ThinkCentre systems have
always provided 3.3 volt and 5 volt power supply levels to PCI adapter slots, but many competitors
have only supplied 5 volt.

32-bit portion of connector


5 volt
32-bit, 33 MHz

64-bit, 33 MHz
Key here = 5 V slot 64-bit extension

3.3 volt Key here = 3.3 V slot

32-bit, 66 MHz

64-bit, 66 to 133 MHz

64-bit PCI Adapter


(IBM ServeRAID-4Mx)

PC Architecture (TXW102) September 2008 16


Topic 4 - Bus Architecture

PCI Express

• Common interconnection solution for handhelds, notebooks,


desktops, and servers
• Gradually replaced PCI and AGP buses and slots
• Offers serial, point-to-point, full duplex link
• PCI Express 1.0 released 2003 (chipsets and systems in 2004)
- Is scalable starting at 2.5 Gb/s with x1 to 80 Gb/s with x32
• PCI Express 2.0 released 2007
- Doubles speed from 2.5 Gb/s to 5.0 Gb/s

Half Duplex
Form Factor Use
Bandwidth (1.0)
(Older) PCI 2.3 32-bit 1 Gb/s Common in desktop and notebooks
PCI Express x1 1-bit 2.5 Gb/s Slots, Gb Ethernet
PCI Express x4 4-bit 10 Gb/s 10 Gb Ethernet, slots, links, SCSI, SAS
Slots, links, Infiniband adapters,
PCI Express x8 8-bit 20 Gb/s
Myrinet adapters
PCI Express x16 16-bit 40 Gb/s Graphics

© 2008 Lenovo

Overview of PCI Express


PCI Express architecture is a state-of-the-art serial interconnect technology that keeps pace with
recent advances in processor and memory subsystems. The PCI Express technology roadmap
continues to evolve while maintaining backward compatibility. PCI Express has gradually replaced
PCI, PCI-X, and AGP parallel buses.
The PCI Express architecture retains the PCI usage model and software interfaces for investment
protection and smooth development migration. The technology is aimed at multiple market
segments in the computing and communication industries, and supports chip-to-chip, board-to-
board, and adapter solutions at an equivalent or lower cost structure than existing PCI designs. PCI
Express 1.0 runs at 2.5 Gb/s or GT/s (2.5 GHz, 0.8V), or 300 MB/s per lane in each direction,
providing a total bandwidth of 80 Gb/s in a 32-lane configuration (and up to 160 Gb/s in a full
duplex x32 configuration). PCI Express 2.0 runs at 5.0 Gb/s, or 600 MB/s per lane in each
direction, providing a total bandwidth of 160 Gb/s in a 32-lane configuration (and up to 320 Gb/s in
a full duplex x32 configuration).
When PCI Express 2.0 was announced, Gb/s (gigabit per second) was replaced with GT/s
(gigatransfers per second).
Future frequency increases will scale up total bandwidth to the limits of copper (which is 12.5 Gb/s
per wire) and significantly beyond that via other media without impacting any layers above the
physical layer in the protocol stack.
See the PCI Special Interest Group at www.pcisig.com for the latest PCI Express specifications.

PC Architecture (TXW102) September 2008 17


Topic 4 - Bus Architecture

Technical Overview of PCI Express


At a basic technical level, PCI Express supports the following key features:
• Point-to-point serial interconnections between devices (using directly wired interfaces between
chips, cables between devices, or connector slots for PCI Express expansion cards). Compared to
the shared, parallel bus architecture of PCI, point-to-point connections permit each device to have
a dedicated link, without arbitrating for a shared bus.
• Scalable bus widths such as x2, x4, x8, and x16 (pronounced “by 2,” “by 4,” etc.)
• Low power consumption and power management functions; supports PCI Bus Power
Management Interface Specification Revision 1.0 and the Advanced Configuration and Power
Interface Specification, Revision 2.0 (ACPI). The PCI Express specification creates two low-
power link states and the active-state power management (ASPM) protocol. When the PCI
Express link goes idle, the link can transition to one of the two low-power states. These states
save power when the link is idle, but require a recovery time to resynchronize the transmitter and
receiver when data needs to be transmitted. The longer the recovery time (or latency), the lower
the power usage. The most frequent implementation will be the low-power state with the shortest
recovery time.
• Supports hot-plugging and hot-swapping of devices.
• Quality of service (QoS) link configuration and arbitration policies.
• Isochronous (or time-dependent) data transfer support.
• Host-based transfers through host bridge chips and peer-to-peer transfers through switches.
• End-to-end and link-level data integrity.
• Packetized and layered protocol architecture.
• Multiple virtual channels per physical link.
• PCI-level error handling and advanced error reporting.
• Use of small connectors for space savings (not compatible with PCI connectors).
• Compatibility with PCI at the software layers (so no operating system or driver changes needed).

PC Architecture (TXW102) September 2008 18


Topic 4 - Bus Architecture

Serial Link Features


A single, basic PCI Express serial link is a dual-simplex (similar to a full duplex) connection using
two low-voltage pairs of differentially driven signals. It has four wires consisting of a receive pair
and a send pair. A differential signal is derived by using the voltage difference between two
conductors. In this approach, the signal is sent from the source to the receiver over two lines. One
contains a “positive” image and the other, a “negative” or “inverted” image of the signal. The lines
are routed using strict routing rules so that any noise that affects one line also affects the other line.
The receiver collects both signals, inverts the negative version back to the positive and sums the
two collected signals, which effectively removes the noise.
The PCI Express 1.0 link signaling speed is 2.5 Gb/s per wire pair (in each direction) [GT/s or
gigatransfers per second], so a dual-simplex connection is 5.0 Gb/s (PCI Express 2.0 doubles these
numbers). PCI Express allows up to 20-inch connections between devices with standard 4-layer
printed circuit board technology and standard connectors. A dual simplex connection permits data
to be transferred in both directions simultaneously, similar to full duplex connections (as in
telephones), but with dual simplex, each wire pair has its own ground unlike full duplex, which
uses a common ground. This configuration is commonly referred to as two unidirectional lanes.
Higher speed and better signal quality is attainable with dual simplex connections.

Unidirectional pair of
wires at 2.5 Gb/s

Packet Transmit

Device Device
Selectable width
Ref. clock A B Ref. clock

Packet Receive

A PCI Express link uses transmit and receive signal pairs

PC Architecture (TXW102) September 2008 19


Topic 4 - Bus Architecture

Disadvantages of Parallel Bus (PCI)

Parallel bus

Original Crosstalk in
signals adjacent signals
Crosstalk happens when the signal on one wire in a parallel
bundle imprints itself on an adjacent wire.

Original signals Parallel bus Signals showing skew

Skew is the result of random imperfections in the wires


and connections of the parallel bundle.

PCI Express Bandwidth


PCI Express uses an embedded clocking technique using 8b/10b encoding. The clock information
is encoded directly into the data stream, rather than having the clock as a separate signal. The
8b/10b encoding essentially requires 10 bits per character, or about 20% channel overhead. This
approach improves the physical signal so that bit synchronization is easier, design of receivers and
transmitters is simplified, error detection is improved, and control characters can be distinguished
from data characters. PCI Express has minimal sideband signals and the clocks and addressing are
embedded in the data. This encoding explains differences in the published PCI Express 1.0 spec
speeds of 5 Gb/s (with the embedded clock overhead [encoded]) and 4 Gb/s (data only without the
overhead [unencoded]). The common industry practice is to cite the higher encoded bandwidth
figures. The bus needs to send 10 bits of encoded data for every 8 bits of unencoded data, so 5 Gb/s
encoded x (8/10) = 4 Gb/s unencoded. PCIe2 replaced Gb/s with GT/s.

PCI Express 1.0 Bandwidth

Encoded Unencoded
PCI Express
Duplex Duplex
Implementation
Data Rate Data Rate
x1 5 Gb/s (625 MB/s) 4 Gb/s (500 MB/s)
x4 20 Gb/s (2.5 GB/s) 16 Gb/s (2 GB/s)
x8 40 Gb/s (5 GB/s) 32 Gb/s (4 GB/s)
x16 80 Gb/s (10 GB/s) 64 Gb/s (8 GB/s)

PC Architecture (TXW102) September 2008 20


Topic 4 - Bus Architecture

Serial Link Configurations


A PCI Express link may be comprised of multiple lanes. Each lane is comprised of the two
differentially driven pair of wires (transmit and receive) of a basic link. The lanes can scale from
the PCI Express 1.0 base 2.5 Gb/s in each direction with a x1 configuration all the way to 80 Gb/s
with an x32 configuration. Multiple lanes can be connected between devices, chips, etc. While it
seems as if it is assembling parallel interfaces again with multiple lanes, these are actually
independent serial connections grouped together, and thus not as prone to the signal quality
problems associated with parallel interfaces.
PCI Express links can be configured in x1, x2, x4, x8, x12, x16, and x32 lane widths. Given a x1
link has 4 wires (two differential signal pairs, one in each direction), a x16 link would have 16
differential signal pairs in each direction, or 64 wires for data transfer.
Links must be symmetric, because links cannot be configured asymmetrically—with more lanes in
one direction versus the other.
Lane ordering can also be swapped per device, and polarities of the positive and negative
conductors of a differential signal pair can be inverted at the receiver to provide design flexibility
and help avoid physical signal crossovers in layout. The connection on the left below shows a x2
device communicating with a x1 device at x1 link width.
A data stream from an application that is transported across a multi-lane PCI Express link would be
striped across the lanes. There would be hardware logic both to split the data stream across multiple
lanes at the transmit side, and to reassemble the stream at the receiving side.

PCI Express Transfers

... ...
Byte 5 Byte 5
Byte 4 Byte 4
Byte 3 Byte 3 Byte stream
(conceptual)
Byte 2 Byte 2
Byte 1 Byte 1
Byte 0 Byte 0

Byte 3
Byte 2
Byte 1 Byte 4 Byte 5 Byte 6 Byte 7
Byte 0 Byte 0 Byte 1 Byte 2 Byte 3

8b/10b 8b/10b 8b/10b 8b/10b 8b/10b


Parallel Parallel Parallel Parallel Parallel
to serial to serial to serial to serial to serial

Lane 0 Lane 0 Lane 1 Lane 2 Lane 3

Bandwidth is selectable using multiple lanes.

PC Architecture (TXW102) September 2008 21


Topic 4 - Bus Architecture

Port, Lane, Link Definition

TX+ CAP RX+

GbE
TX- CAP RX-
Port 0 Port 0
RX+ CAP TX+
RX- CAP TX-

TX+
TX-
Port 1
RX+
RX-
ICH6

TX+
Lane
TX-
Port 2
RX+
RX-

TX+
x1 Link
TX-
Port 3
RX+
RX-

• Port
– A group of transmitters and receivers located on the same chip capable of forming an
independent link
• Lane
– A signal path that connects a set of differential signal pairs, one pair for transmission and one
pair for reception between two devices
• Link
– An associated port and lane or multiple ports and lanes forming a connection between two
devices

Packetized, Layered Protocol


PCI Express is compatible with conventional PCI at the software layer. So it supports existing OSes
and BIOS without any changes. It is compatible with the PCI device driver model and existing
software stacks. PCI Express devices look just like PCI devices to legacy software. However, the
PCI configuration space has been extended from 256 bytes to 4 KB. Also, advanced features such
as isochrony, hot plug, and active power state management must be enabled through new BIOS and
drivers.
PCI Express uses a packetized and layered protocol structure. It does not require any sideband
signaling riding alongside the main serial interconnection. Layered protocols permit isolation
between different functional areas in the protocol and allow updating/upgrading different layers
often without requiring changes in the other layers. For example, new transaction types might be
included in newer revisions of a protocol specification that doesn't affect lower layers, or the
physical media might be changed with no major effects on higher layers.

PC Architecture (TXW102) September 2008 22


Topic 4 - Bus Architecture

The three protocol layers are transaction, data link, and physical. From the transmitting side of a
transaction, packets are formed at the higher layers, and each successively lower layer adds more
information to the packet, until it is sent across the physical link to the receiving device. The packet
then traverses up the protocol stack at the receiving device until data is extracted and passed to the
application.

PCI Express PCI Express

Software Software

Transaction Transaction

Data link Data link

Physical Physical

Mechanical Mechanical

Data Flow Through PCI Express Layers

The following diagram illustrates the actual packet structure showing the envelope within an
envelope construct of layered protocols. The higher layers of packet information are encapsulated in
the lower layer envelopes. The application-level data is ultimately at the core of the packet. The
transaction layer uses a 32-bit CRC for end-to-end transfers by optionally including a trailer
section, known as a digest. The CRC for the data link layer is a 16-bit value.

Sequence
Frame Header Data CRC Frame
number

Transaction layer

Data link layer

Physical layer

PCI Express Packet Structure

PCI Express supports multiple virtual channels per lane. Up to eight different independently
controlled communication sessions may exist in a single lane. Each session may have different
quality of service definitions per the packet's Traffic Class (TC) attribute. As a packet travels
though a PCI Express fabric, at each switch or link endpoint, the traffic class information can be
interpreted and appropriate transport policies applied. The traffic class descriptor in the packet
header comprises three bits representing eight different traffic classes.

PC Architecture (TXW102) September 2008 23


Topic 4 - Bus Architecture

PCI Express has four different transaction types of transactions: memory, I/O, configuration, and
message.
Transaction Types for Different Address Spaces

Address Space Transaction Types Basic Usage


Memory Read Transfer data to/from a memory-mapped location
Write
I/O Read Transfer data to/from an I/O-mapped location
Write
Configuration Read Device configuration/setup
Write
Message Baseline From event signaling mechanism to general
Vendor-defined purpose messaging
Advanced switching

Interrupts
PCI Express supports two types of interrupts: the older PCI INTx (where x = A, B, C, or D) legacy
interrupt using an emulation technique and the newer Message Signaled Interrupt (MSI) capability.
MSI is optional in PCI 2.2/2.3 devices but required as the native mode of PCI Express devices.
The INTx emulation can signal interrupts to the host chipset. This emulation is compatible with
existing PCI-compatible driver and operating system software. The emulations virtualizes PCI
physical hardwired interrupt signals by using an in-band signaling mechanism. PCI Express devices
must support both the legacy INTx and MSI modes, and legacy devices will encapsulate the INTx
interrupt information inside a PCI Express Message transaction.
MSI interrupts are edge-triggered and sent via memory write transactions. Driver rewrites would be
necessary to take advantage of MSI edge-triggered interrupts. The MSI scheme is the desired native
method of interrupt propagation when using a packetized protocol over a serial link (because there
are no sideband or extra hardwired interrupt signals).

I/O Virtualization Specifications


In March 2007, the PCI Special Interest Group released the I/O Virtualization Address Translation
Services 1.0 specification. The specification is available for download on the PCI-SIG website at
www.pcisig.com/specifications/iov/ats.
PCI-SIG I/O Virtualization (IOV) Specifications, in conjunction with system virtualization
technologies, allow multiple operating systems running simultaneously within a single computer to
natively share PCI Express devices. The Address Translation Services (ATS) specification provides
a set of transactions for PCI Express components to exchange and use translated addresses in
support of native I/O Virtualization.

PC Architecture (TXW102) September 2008 24


Topic 4 - Bus Architecture

PCI Express:
PCI Express 1.0/1.1 vs. PCI Express 2.0 (PCIe2)

• PCI Express 2.0 doubles bandwidth from 2.5 GT/s to 5.0 GT/s
• PCI Express 2.0 is compatible with earlier version (1.0 and 1.1) so older
adapters will work
• PCI Express 2.0 has improvements in data transfer protocol and software
architecture
• Vendors may implement a mix of PCI Express 2.0 for graphics and
PCI Express 1.0 for remaining devices

PCI Express 1.0/1.1 (Gen 1) PCI Express 2.0 (PCIe2)


Release date 2003 2007
Half duplex bandwidth (x1) 2.5 GT/s 5.0 GT/s
Form factor x1, x4, x8, x16 x1, x4, x8, x16
Graphics form factor x16 x16
Chipsets 2004 2007

© 2008 Lenovo

PCI Express 1.0/1.1 vs. PCI Express 2.0 (PCIe2)


The PCI Express 2.0 (or PCI Express Gen 2 or PCIe2) specification was released in January 2007
and doubled the interconnect bit rate from 2.5 GT/s to 5 GT/s in a seamless and compatible
manner. The performance boost to 5 GT/s was the most important feature of the specification. The
higher bandwidth allowed product designers to implement narrower interconnect links to achieve
higher performance while reducing cost. PCIe2 replaced Gb/s (gigabits per second) with GT/s
(gigatransfers per second).
A number of optimizations and improvements were made to the protocol and software layers of the
PCI Express 2.0 architecture. These include:
• Dynamic link speed management - to control the speed at which the link is operating
• Link bandwidth notification – to notify software (operating system, device drivers, etc.) of
changes in link speed and width
• Capability structure expansion – to expand the control registers to better manage devices, slots
and the interconnect
• Access control services – optional controls to manage peer-peer transactions
• Completion timeout control – to define a required disable mechanism plus related optional
enhancements
• Function-level reset – optional mechanism to reset functions within a device
• Power limit redefinition – to redefine slot power limit values to accommodate devices that
consume higher power

PC Architecture (TXW102) September 2008 25


Topic 4 - Bus Architecture

PCI Express:
Implementation

• Integrated in Memory Controller Hubs (Intel 915 and


later chipset families)
- x16 for graphics (replaces AGP)
• Integrated in I/O Controller Hubs (ICH6 and later)
- x1 or x4 for slots
- x1 for Gigabit Ethernet
• Older PCI adapters not supported in PCIe slots

Processor

x16
Graphics PCI Express Connectors
MCH Memory

PCI Express
PCI slots
Gb Ethernet
Serial ATA
PCI Express x1 slot ICHx
USB 2.0
PCI Express x1 slot Super I/O

© 2008 Lenovo

Desktop Slots
A family of connectors is specified, ranging from x1 to x16 bus widths, including x1, x4, x8, and
x16. The x2 configuration is reserved for other types of PCI Express interconnects, but not in slots.
The connectors for AGP replacement slots will be x16. Legacy PCI slots will exist on their own
and will sit adjacent to native PCI Express connectors. It will be possible to up-plug smaller PCI
Express cards into larger slots, but not vice-versa. Down-plugging is not allowed, so larger link-
width cards will not operate in smaller slots. PCI Express cards optionally can support hot-plug and
hot-swap features. Three voltage rails are available: +3.3V, +3.3Vaux, and +12V.
Both standard and low profile adapters for desktops, workstations, and servers are available. The
standard and low profile form factors support x1, x2, x4, and x16 implementations.

PCI Express Card Interoperability

x1 slot x4 slot x8 slot x16 slot


x1 adapter Required Required Required Required
x4 adapter No Required Allowed Allowed
x8 adapter No Allowed* Required Allowed
x16 adapter No No No Required

*Implementation has an x8 connector on a wired x4 slot; this means the slot


will accept x8 adapters but run at x4 speeds.

PC Architecture (TXW102) September 2008 26


Topic 4 - Bus Architecture

Basic x1 PCI Express connector PCI Express connectors (each is longer)

PCI 2.3

PCI 2.3

PCI Express x1

PCI Express x16

Top slots (PCI 2.3)


Third slot (PCI Express x1)
Bottom slot (PCI Express x16)

PCI Express
Connector, x1

PCI Express
Connector, x16

Systemboard with Both PCI and PCI Express Connectors

PC Architecture (TXW102) September 2008 27


Topic 4 - Bus Architecture

PCI Express x1 Connector PCI 2.3 slot (top)


PCI Express x1 or ADD2 (bottom)

x1 PCI Express Adapter Outline PCI Adapter Outline

millimeters (mm)
0 10 20 30 40 50 60 70 80 90 100 110 120 130

PCI
x1

AGP 8X
x16

AGP Pro
x16

PCI-X
x4
x8
x16

PCI Express Connector Lengths (in millimeters)

PC Architecture (TXW102) September 2008 28


Topic 4 - Bus Architecture

PCI Express Adapter Size


The standard PCI Express adapter form factor is 106 mm x 311 mm. The standard defines two sizes;
a long form and a short form factor board.

311.83 mm [Long Card]


174 mm [Short Card]

PCI Express PCI Express


106.65 Short Card Long Card

PC Architecture (TXW102) September 2008 29


Topic 4 - Bus Architecture

PCI Express Adapter Labeling

PCI Express Adapter Explanation


PCIe x1 10W Adapter with x1 edge connector capable of operating in x1 Link width, uses 2.5
GT/s signaling, and requires 10W to be delivered by the PCIe slot.
PCIe x16 75W Adapter with x16 edge connector capable of operating in x16, x12, x8, x4, x2, or x1
negotiable Link width, uses 2.5 GT/s signaling, and requires 75W to be delivered by
the PCIe slot.

PCIe x8 Adapter with x8 edge connector capable of operating in x8, x4, x2, or x1 negotiable
Link width, uses 2.5 GT/s signaling, and requires 25W to be delivered by the PCIe
slot.

PCIe x16 (16,8,4,1) 75W Adapter with x16 edge connector capable of operating in x16, x8, x4, or x1
negotiable Link width, uses 2.5 GT/s signaling, and requires 75W to be delivered by
the PCIe slot.

PCIe x16 (16,8,4,1) 75W Adapter with x16 edge connector capable of operating in x16, x8, x4, or x1
+EXT 75W negotiable Link width, uses 2.5 GT/s signaling, and requires 75W to be delivered by
the PCIe slot. Adapter also requires an additional 75W from the PCIe x16 Graphics
150W-ATX power connector. Adapter may not operate at full performance or only in
diagnostics mode if the additional power connector is not present.

PCIe2 x16 (16,8,4,1) 75W Adapter with x16 edge connector capable of operating in x16, x8, x4, or x1
+EXT 150W negotiable Link width, uses 5.0 GT/s signaling, and requires 75W to be delivered by
the PCIe slot. Adapter also requires an additional 150W from the PCIe High Power
Specification connector. Adapter may not operate at full performance or may only
operate in diagnostics mode if the additional power connector is not present.

PCIe2 x16 (16,8,4,1) 75W Adapter with x16 edge connector capable of operating in x16, x8, x4, or x1
+EXT 225W negotiable Link width, uses 5.0 GT/s signaling, and requires 75W to be delivered by
the PCIe slot. Adapter also requires an additional 225W from the PCIe High Power
Specification connector. Adapter may not operate at full performance or may only
operate in diagnostics mode if the additional power connector is not present.

PCIe2 x4 Adapter with x4 edge connector capable of operating in x4, x2, or x1 negotiable Link
width, supports 2.5 GT/s and 5.0 GT/s signaling, and requires 25W to be delivered
by the PCIe slot.

PC Architecture (TXW102) September 2008 30


Topic 4 - Bus Architecture

PCI Express Slot Labeling

PCI Express Slot Explanation


PCIe x1 10W Slot with a x1 maximum Link width using 2.5 GT/s signaling and provides 10W
power to an Adapter.

PCIe x16 75W Slot using a x16 connector, supports x16, x12, x8, x4, x2, and x1 adapters, uses 2.5
GT/s signaling and provides 75W power to an Adapter.

PCIe x8 (4,1) Slot using a x8 connector but with a x4 maximum Link width using 2.5 GT/s
signaling and provides 25W power to an Adapter.

PCIe x16 (8,1) 75W Slot using a x16 connector but with a x8 maximum or a x1 negotiable Link width
using 2.5 GT/s signaling and provides 75W power to an Adapter.

PCIe2 x16 (8,1) 300W Slot using a x16 connector but with a x8 maximum or a x1 negotiable Link width
using 5.0 GT/s signaling and provides 300W power to an Adapter.

PCIe2 x16 (16,1) Slot with x16 connector, supports a x16 maximum Link width or x1 negotiable Link
width using 5.0 GT/s signaling and provides 25W power to an Adapter.

PCIe 5 x8 Slot number 5. Slot with a x8 connector, supports x8, x4, x2, and x1 adapters, uses
2.5 GT/s signaling and provides 25W power to an Adapter.

PCIe Power Slot provides a PCIe connector but does not support the PCIe protocol. Default
power 25W is provided.

PCIe Power 75W Slot provides a PCIe connector but does not support the PCIe protocol. 75W power
is provided.

Typically, the size of a slot matches the number of lanes it has. For example, a x4 slot is typically a
x4 link (that is, it has 4 lanes). However, this is not always the case. The PCI Express specification
allows for the situation where the physical connector is larger than the number of lanes of data
connectivity. The only requirement on manufacturers is that the connector must still provide the full
complement of power and ground connections as required for the connector size.
For example, a system could have these two slots:
• PCIe x8
• PCIe2 x8 (4,1)
The first slot is PCI Express with x8 physical connector (in other words, it will physically accept x8
cards, as well as x4, x2, and x1 cards), and it has the bandwidth of a x8 link (8 x 2.5 GT/s or 20
GT/s). The second slot is a PCI Express 2.0 slot with x8 physical connector, but it only has the
bandwidth of a x4 link (4 x 2.5 GT/s or 10 GT/s).
If there is a need for x8 bandwidth (such as for an Infiniband or Myrinet adapter), then ensure the
correct slot is selected (one with x8 lanes).
The physical size of a PCI Express slot is not the sole indicator of its possible bandwidth. The
bandwidth capacity of each slot must be determined from slot descriptions on the system board or the
service label of the server.

PC Architecture (TXW102) September 2008 31


Topic 4 - Bus Architecture

PCI Express:
Mini PCI Express

• Extension of PCI Express to notebooks


• Replaces Mini PCI
- Cards half the size of original Mini PCI
• Two sizes:
- Half-Mini PCI Express Adapter
- Full-Mini PCI Express Adapter
• Mini PCI Express adapters require both PCI
Express x1 and USB 2.0 interconnects

Mini PCI Express adapter


installed in notebook

ThinkPad 11a/b/g Wireless LAN


Mini PCI Express Adapter
© 2008 Lenovo

Mini PCI Express


The primary differences between a PCI Express add-in card and a Mini PCI Express add-in card
are a unique card form factor optimized for mobile computing platforms and a card-system
interconnection optimized for communication applications. Specifically, Mini PCI Express add-in
cards (adapters) are smaller and have smaller connectors than standard PCI Express add-in cards
(adapters).
Mini PCI Express adapters are targeted toward addressing system manufacturers’ needs for build-
to-order and configure-to-order rather than providing a general end-user-replaceable module.
Mini PCI Express adapters use a single 52-pin card-edge type connector for the system interface;
almost a third of the pins are reserved for future use. The connector is similar to the SO-DIMM
connector and is modeled after the Mini PCI Type III connector without side retaining clips.
There are two sizes for Mini PCI Express:
• Half-Mini PCI Express Adapter (half the size of a Full-Mini PCI Express Adapter)
• Full-Mini PCI Express Adapter
The latest specification for Mini PCI Express adapters is PCI Express Mini Card Electromechanical
Specification Revision 1.2 dated October 2007.

PC Architecture (TXW102) September 2008 32


Topic 4 - Bus Architecture

Mini PCI Express adapters support two primary system bus interfaces: PCI Express and USB 2.0.
An adapter can use either PCI Express or USB 2.0 (or both).

Mini PCI Express

PCI
Modem
Express
x1
System buses

USB 2.0 Function- Ethernet


Mini PCI Express
specific
communication-centric function
connector
Wireless

LEDs

System Function I/O


Interface Interface

Logical Representation of the Mini PCI Express Specification

Processor

x16
Graphics Memory
MCH
Docking x16
x16
PCI Express
x1

Gb Ethernet Serial ATA


ICH
Mini PCI Express adapter USB 2.0
(such as 802.11 wireless)
Super I/O
Mini PCI Express adapter
(such as Wireless WAN) Docking x1

Block Diagram with PCI Express on Notebook Systems

PC Architecture (TXW102) September 2008 33


Topic 4 - Bus Architecture

Mini PCI Type III card Mini PCI Express Adapter

Allows
2nd
51 mm 51 mm card

61 mm 30 mm 30 mm

Mini PCI Mini PCI


Express #1 Express #2

Two Full-Mini PCI Express adapters will fit in same space as older Mini PCI

Verizon WWAN EV-DO Rev. A Mini PCI Express Adapter


(part number 43R1818)

Lenovo ThinkPad notebooks use Mini PCI Express

PC Architecture (TXW102) September 2008 34


Topic 4 - Bus Architecture

Full-Mini PCI Express Adapter Form Factor (dimensions in mm)

Full-Mini PCI Express Adapter Top and Bottom (dimensions in mm)

PC Architecture (TXW102) September 2008 35


Topic 4 - Bus Architecture

PCI Express:
Mini PCI Express Form Factors

ƒFull-Mini PCI Express card (FMC)


• Type F1 : Traditional card
- Can only fit in F1 slots
F1 Slot F1Extender Posts

• Type F2 : Requires keep out zones on bottom side


- Can fit into F1 or F2 slots
- Can fit into dual head-to-head configuration
- F2 slot can fit Half-Mini cards with posts
F2 Slot F2
ƒHalf-Mini PCI Express card (HMC)
• Type H1
- Can fit into H1 or F2 slot Keep Outs
- Cannot fit into H2 slots
- Requires extender for F1 slot
• Type H2 : Keep out zones on bottom side; used in
dual head-to-head configuration
H1
H1 Slot H2
H2 Slot

- Can fit into H1, H2, or F2 slots


- Requires extender for F1 slot
Dual Head-to-Head Configuration

© 2008 Lenovo

Mini PCI Express Form Factors


Mini PCI Express cards exist in different form factors. Only Type F1 existed prior to 2007; after
2007, other types such as F2, H1, and H2 were introduced.
A keep out zone is an area on the Mini PCI Express card that can not have physical circuitry
because the area is needed for mounting or space clearance. A post is a screw hole or solder hole
required to secure the card to a systemboard. Circuitry can exist on both sides of the card.
The following is implemented as a Half-Mini PCI Express card:
• Certified Wireless USB (such as in select ThinkPad T61 notebooks)

Full-Mini PCI Express Card Half-Mini PCI Express Card


(Certified Wireless USB)

PC Architecture (TXW102) September 2008 36


Topic 4 - Bus Architecture

Following are some examples of Mini PCI Express adapters:


• Certified Wireless USB (such as in select ThinkPad T61 notebooks)
• Intel Turbo Memory

Half-Mini PCI Express Adapter


(Certified Wireless USB)

Front Back
Half-Mini PCI Express Adapter
(Intel Turbo Memory 2GB and 4GB)

Full-Mini PCI Express Adapter


(Intel Turbo Memory 2GB and 4GB)

PC Architecture (TXW102) September 2008 37


Topic 4 - Bus Architecture

Platform Example

One Type F1
F1

Two Type F1 (common) F1 F1

H2
Two Type F1 and One Type H2 F1 F1

ThinkPad notebook with two Type F1


Full-Mini PCI Express slots

ThinkPad notebook with three Mini PCI Express slots (two


Type F1 Full-Mini and one Type H2 Half-Mini)

PC Architecture (TXW102) September 2008 38


Topic 4 - Bus Architecture

Bottom Side

Component and exposed routing


(bottom layer) keep out for hold
down solutions

Full-Mini PCI Express Adapter


Type F2

PC Architecture (TXW102) September 2008 39


Topic 4 - Bus Architecture

Bottom Side

Component and exposed routing


(bottom layer) keep out for hold
down solutions

Half-Mini PCI Express Adapter


Type H2

PC Architecture (TXW102) September 2008 40


Topic 4 - Bus Architecture

Half-Mini PCI Express Card


Type H2

PC Architecture (TXW102) September 2008 41


Topic 4 - Bus Architecture

PCI Express:
PCI Express 1.0 vs PCI

PCI Express 1.0


x16 80

x8 40

x4 20

x2 10

x1 5

PCI-X 2.0
QDR 32

DDR 16

PCI-X 1.0
8

PCI 2.3 (64-bit)


4

PCI 1.0 (32-bit)


All bandwidth numbers are shown
1 in Gb/s for comparison purposes.

© 2008 Lenovo

PCI Express 1.0 vs PCI


The graph on the slide shows the bandwidth in Gb/s for several generations of PCI buses and
provides an excellent illustration of the differences between the generations.

Duplex Half duplex Duplex Half duplex


PCI
both directions one direction both directions one direction
Express
[encoded] [encoded] [encoded] [encoded]
1.0
(in bits) (in bits) (in Bytes) (in Bytes)
x16 80 Gb/s 40 Gb/s 10 GB/s 5 GB/s
x8 40 Gb/s 20 Gb/s 5 GB/s 2.5 GB/s
x4 20 Gb/s 10 Gb/s 2.5 GB/s 1.25 GB/s
x2 10 Gb/s 5 Gb/s 1250 MB/s 625 MB/s
x1 5 Gb/s 2.5 Gb/s 625 MB/s 312.5 MB/s

Duplex Half duplex Duplex Half duplex


PCI both directions
both directions one direction one direction
Express
[unencoded] [unencoded] [unencoded] [unencoded]
1.0
(in bits) (in bits) (in bytes) (in bytes)
x16 64 Gb/s 32 Gb/s 8 GB/s 4 GB/s
x8 32 Gb/s 16 Gb/s 4 GB/s 2 GB/s
x4 16 Gb/s 8 Gb/s 2 GB/s 1 GB/s
x2 8 Gb/s 4 Gb/s 1 GB/s 500 MB/s
x1 4 Gb/s 2 Gb/s 500 MB/s 250 MB/s

PC Architecture (TXW102) September 2008 42


Topic 4 - Bus Architecture

PCI-X 2.0

Quad
32 Gb/s or 4.2 GB/s or 4200 MB/s at 64-bit 533 MHz
data rate

Double
16 Gb/s or 2.1 GB/s or 2100 MB/s at 64-bit 266 MHz
data rate

PCI-X 1.0
8 Gb/s or 1064 MB/s at 64-bit 133 MHz
4 Gb/s or 528 MB/s at 32-bit 133 MHz
6 Gb/s or 800 MB/s at 64-bit 100 MHz
3 Gb/s or 400 MB/s at 32-bit 100 MHz
4 Gb/s or 528 MB/s at 64-bit 66 MHz
2 Gb/s or 264 MB/s at 32-bit 66 MHz
PCI 2.3
4 Gb/s or 528 MB/s at 64-bit 66 MHz
2 Gb/s or 264 MB/s at 64-bit 33 MHz
2 Gb/s or 264 MB/s at 32-bit 66 MHz
PCI 1.0
1 Gb/s or 132 MB/s at 32-bit 33 MHz

Bandwidth of PCI Versions for Comparison

PC Architecture (TXW102) September 2008 43


Topic 4 - Bus Architecture

ExpressCard

• ExpressCard standard developed by PCMCIA


• Gradually replace PC Cards (PCMCIA) in notebooks
• Two types of modules:
- ExpressCard/34 is 34 mm wide
- ExpressCard/54 is 54 mm wide
• Introduced in notebooks in 2005 with Mobile Intel 9xx chipset
Old PC Card New New
ExpressCard ExpressCard

© 2008 Lenovo

ExpressCard
ExpressCard technology is a standard released in 2003 by the Personal Computer Memory Card
International Association (PCMCIA). ExpressCard is a small, modular add-in card designed to
replace the larger PC Card over the next couple years. ExpressCard slots were first introduced in
notebooks in 2005 with the Mobile Intel 9xx Express Chipset.
The ExpressCard add-in cards are called modules. Modules can include wired or wireless
communication devices, solid state or rotating optical and magnetic storage devices, identity
sensors, flash memory cards, networking devices, smart card readers, TV tuner devices, etc.

ExpressCard/34 and ExpressCard/54 modules ExpressCard Gemplus Smart Card Reader


(part 41N3043)

PC Architecture (TXW102) September 2008 44


Topic 4 - Bus Architecture

To support a broad range of device types, there are two sizes available; both are smaller than the
common PC Card. The smaller card, the ExpressCard/34 module, has a width of 34 mm. The larger card,
the ExpressCard/54 module, has a width of 54 mm. All modules are 5 mm thick and 75 mm long, but the
standard allows for developers to build longer, extended modules that have thicker portions projecting
beyond the envelope of the host system. The larger ExpressCard/54 module provides approximately
140% of the internal volume capacity of the ExpressCard/34 module.

54mm

PC Card
(54mm x 85.6mm)

ExpressCard/34
(34mm x 75mm)
85.6mm

ExpressCard/54
(54mm x 75mm)

The ExpressCard is smaller than the PC Card.

PC Card* ExpressCard/54 ExpressCard/34


Size (HWD) 5 x 54 x 86 mm 5 x 54 x 75 mm 5 x 34 x 75 mm
Connector pins 68 (parallel) 26 (serial) 26 (serial)
(and port type)
Connector type Pin in socket Beam on blade Beam on blade
Bandwidth 132 MB/s Via ExpressCard Via ExpressCard
connector, 250 MB/s connector, 250 MB/s
each direction; via USB each direction; via
2.0, 60 MB/s USB 2.0, 60 MB/s

*32-bit CardBus, Type II

PC Architecture (TXW102) September 2008 45


Topic 4 - Bus Architecture

ExpressCard:
Implementation

• ExpressCard slots require both USB 2.0 and PCI Express


(x1 link) interconnects
• Hot-pluggable
• Targeted for both notebooks and desktops
• ExpressCard/54 slot used in select Lenovo notebooks

ExpressCard/34

PCI
Express

Host chipset
54 mm slot
USB

ExpressCard/54

© 2008 Lenovo

ExpressCard Implementation
ExpressCard slots require support of both USB 2.0 and PCI Express ( 1 link). An ExpressCard
module can be implemented using either PCI Express or USB, depending on the bandwidth
required. A USB-based ExpressCard module is practical for lower-speed devices such as Bluetooth
wireless cards or flash memory. A PCI Express-based ExpressCard is useful for higher-bandwidth
devices such as Gigabit Ethernet or 1394b cards. All ExpressCard slots will support cards using
either interface. The host platform no longer needs to incorporate a bridge chip between the chipset
and the socket.

The ExpressCard link has a potential transfer rate of up to 5.0 Gb/s or 400 MB/s (2.5 Gb/s or 200
MB/s in each direction) using a single-lane 1 PCI Express link; this link is up to four times faster
than a CardBus PC Card transfer. USB 2.0 uses a transfer rate of 480 Mb/s or 60 MB/s.
The connector is a 26-pin, beam-on-blade style with a serial interface. The connector includes
signals for control, power, and I/O; it uses a beam or blade connector type. The older PC Card used
68 pins with a parallel interface and a pin-in-socket connector type.
A conventional push-button or other module ejection mechanism can be implemented at each
system OEM's discretion.
ExpressCard is fully hot-pluggable and hot-swappable.
See www.expresscard.org for more information.

PC Architecture (TXW102) September 2008 46


Topic 4 - Bus Architecture

PCI

Host chipset
Express
x1
34 mm slot
USB
2.0
ExpressCard/34

34 mm ExpressCard slot only supports ExpressCard/34 modules.

ExpressCard/34

PCI
Express
x1

Host chipset
54mm slot
USB
2.0

ExpressCard/54

54 mm ExpressCard slot supports either ExpressCard/34 or ExpressCard/54 modules

ExpressCard/34 PCI
Express
Host chipset

x1

68 mm slot USB
2.0

ExpressCard/34

A 68 mm ExpressCard slot supports two ExpressCard/34 modules.

PCI
ExpressCard/54
Express
Host chipset

x1

68 mm slot USB
2.0

A 68 mm ExpressCard slots supports one ExpressCard/54 module.

PC Architecture (TXW102) September 2008 47


Topic 4 - Bus Architecture

While ExpressCard slots are expected in notebooks, ExpressCard slots are useful for desktops
because they allow expansion to a desktop without requiring the opening of the cover (such as to
add PCI or PCI Express adapters). ExpressCard slots give desktops the same ‘sealed box’
computing benefits enjoyed by notebooks.
ExpressCard technology draws upon many of the features of existing PC Card technology.
ExpressCard technology balances size and utility, reliability and durability, and features hot plug-n-
play and auto-configuration. There are also other significant differences between the PC Card
standard and ExpressCard standard.
1. Size: ExpressCard modules are roughly half the size of PC Card; the modules are lighter weight.
2. Speed: ExpressCard modules use serial data interfaces rather than the PCI parallel bus interface
of PC Card, improving bus speed in data transfer while reducing the number of signals needed in
the interface.
3. Cost: Because of its system and mechanical design, ExpressCard designs are anticipated to have
a lower implementation cost.
4. Less power: ExpressCard modules are expected to require less power than has traditionally been
required. Both types of modules put out less than 1.3 watts of dissipation.
5. Ease of use: ExpressCard modules offer a much easier method for installing new capabilities in a
desktop computer, because it eliminates the need to open the CPU chassis in order to add
functionality. In addition, ExpressCard is hot-swappable between mobile and desktop systems,
which is another plus for end users.

Shape of shutter door Shape of eject button

Put printed sheet on molded symbol for PCMCIA

Implementation of ExpressCard on ThinkPad T43


Top slot supports one ExpressCard/34 or ExpressCard/54 modules
Bottom slot supports CardBus PC Cards

PC Architecture (TXW102) September 2008 48


Topic 4 - Bus Architecture

Universal Serial Bus (USB)

• Standard to connect multiple I/O devices at high speed


with small cables (127 devices)
• Devices: keyboards, mouse, printers, modems,
microphones, speakers, digital audio, telephone
• Flat, 4-pin connector, hot-pluggable devices
• No IRQ settings, DMA channels, and I/O settings;
host/slave architecture
• Current Lenovo products have USB port(s) USB connectors
• U3 is a standard that allows applications to run entirely
from a USB memory key

Host and hub Powered hub Unpowered hub

PC Monitor Keyboard

Basic-speed USB logo Hi-speed USB logo

Phone/
Speaker Mic Mouse Digitizer
modem

© 2008 Lenovo

Universal Serial Bus (USB)


Universal Serial Bus (USB) is a standard for communication between a computer and an external
device over an inexpensive cable. USB allows the addition of a new device to a PC by plugging it
into the system or daisy-chaining it from another device on the bus. The device is immediately
available for operation (elimination of device driver installation) without the need to reboot the PC.
USB ports coexist with other PC peripheral ports (keyboard, mouse, serial, parallel, audio in/out,
joysticks), and often two to eight USB ports replace the other PC peripheral ports.
All current Windows operating systems support USB natively, so unique drivers for each USB
device are not needed.

Lenovo USB 2.0 Super Multi-Burner Drive Lenovo USB 2.0 Security Memory Key 4 GB
(part number 41N5565) (part number 41U5120)

PC Architecture (TXW102) September 2008 49


Topic 4 - Bus Architecture

With USB-enabled PCs, users are able to hot-plug USB peripherals without needing to reboot their
systems or change IRQ settings, DMA channels, and I/O addresses.
If many USB devices are in use, it is best to use powered USB hubs. Also, it is useful to look for a
hub with per-port switching, which prevents a failed device in the hub from disabling a whole
chain.
USB devices do not require IRQ settings, DMA channels, or I/O settings. Thus, COM and LPT
ports currently occupying an IRQ can be made available.
USB is a tiered-star topology which relies on hubs that permit connection of multiple devices; it
operates as a host/slave architecture. For example, the PC could act as a hub for modems or printers
while a monitor acts as a hub for speakers and microphones, and a keyboard could act as a hub for
a mouse and a joystick. Each USB link (segment) is a point-to-point connection, so a hub is
required at each point where multiple connections are needed. Hubs act as repeaters, which redrive
the signals in each direction and provide termination for each line. Little intelligence is required
because hubs do not process data as it is passed through. Hubs include control and status registers
that enable the host to enable and disable each port and to determine whether a device is connected
to each port. USB supports eight hubs, and each hub can support up to sixteen nodes. For each
USB connector on a PC, if more than one USB device is to be attached, the device attached must
be a hub.
USB can handle isochronous and asynchronous data streams. Isochronous support allows multiple
devices to operate concurrently with guaranteed throughput and data latency (for audio to
synchronized with video). Asynchronous operation is possible for devices that do not require
guaranteed bandwidth.

USB port
USB port that attaches
that attaches 7.3 mm to peripheral
to computer Type B
4.5 mm
Type A (“B” Plug)
(“A” Plug)

USB port USB port


that attaches 3.0 mm 3.0 mm
that attaches
to peripheral to peripheral
5-pin Mini-Type A 5-pin Mini-Type B
6.8 mm 6.8 mm

12.0 mm 8.5 mm

PC Architecture (TXW102) September 2008 50


Topic 4 - Bus Architecture

USB ports USB ports


Type A (left) and Type B (right) Mini-Type A (left) and Mini-Type B (right)
There are three types of transactions between the host and a peripheral. First, the host sends a token
packet that contains the type and direction of a transaction at regular intervals. Then, depending on
the type of transaction specified by the host controller, either the host or peripheral send a data
packet. Finally, the recipient of the data packet sends a handshake packet to acknowledge a
successful transfer. USB uses two-wire differential signaling with NRZ1 encoding and a preamble
in every packet for receiver synchronization.
The USB standard is controlled by the USB Implementers Forum. See www.usb.org for more
information.

Cable Application
Std-A plug to
Primary cable for connecting Std-B peripherals to PCs and other hosts
Std-B plug
Std-A plug to
Primary cable for connecting Mini-B peripherals to PCs and other hosts
Mini-B plug
Std-A plug to New – Primary cable for connecting Micro-B or Micro-AB peripherals to PCs and
Micro-B plug other hosts
Micro-A plug to New – Provides OTG products with the same receptacle (Std-A) that is available
Std-A receptacle on PCs. Any peripheral that can connect to a PC can connect to an OTG product
(Adapter) by using this adapter
Micro-A plug to
New – Allows direct interconnection between OTG products
Micro-B plug
Hard-wired
New – Supported on products, such as roll-up keyboards, that are targeted
captive cable with
exclusively for use with OTG hosts
Micro-A plug

PC Architecture (TXW102) September 2008 51


Topic 4 - Bus Architecture

USB On-The-Go (OTG)


USB On-The-Go (OTG) 1.0 was released in December 2001 as a standard for intelligent mobile
devices (digital cameras, smart phones, personal digital assistants) to communicate without a PC
(host). OTG is an addendum to the USB 2.0 specification.
USB OTG allows a product to be both host and peripheral (dual-role) and to switch roles on
demand. This standard supports device-to-device connectivity (called peering) without the need for
PC intervention. Only one side of the connection needs to be OTG-enabled, allowing
interconnectivity with the installed base of USB products.
USB OTG defines a smaller connector with a different key to ensure correct connection. These
connectors are called Mini-connectors. The Mini-A connector is for a host or master while a Mini-
B connector is for a device or slave.

U3
U3 is a software platform that lets vendors load software applications that run entirely from a flash-
based USB key. These applications are not just simple file synchronization; there are dozens of U3
applications available, including e-mail clients, antivirus scanners, word processors, Web browsers,
and even a U3 version of Skype. You plug in the key and you can run those applications on any
PC. No changes are made to the system registry, leaving the PC without any evidence of use. See
www.u3.com for more information.

PC Architecture (TXW102) September 2008 52


Topic 4 - Bus Architecture

Internal Systemboard USB Connectors

Internal USB Connectors on Lenovo desktop systemboard


[red circles; one available and one used]

Often systemboards have internal USB connectors. For example, on select Lenovo ThinkCentre
M57 desktops, the USB ports are implemented in the following manner:
• The 6 rear USB ports are soldered onto the systemboard.
• The 2 front USB ports are connected to USB Connector #2 on the systemboard.
There are a total of two USB connectors on the systemboard (both yellow). The one on the left is
unconnected (USB Connector #1); the one on the right is connected (USB Connector #2).
Each systemboard USB connector is capable of supporting two USB ports.
The engineers left USB Connector #1 unconnected so that customers can take advantage of it if
they wish to. If a customer wants two more USB ports in the rear, they can connect two more USB
ports to USB Connector #1 on the systemboard, but this will occupy one PCI slot via an adapter. If
a customer wants to install a memory card reader (in the diskette drive bay), then they can connect
the memory card reader internally to the USB Connector #1 on the systemboard.

PC Architecture (TXW102) September 2008 53


Topic 4 - Bus Architecture

Universal Serial Bus (USB):


Speeds

• Most USB ports and products shipping since 2004 are USB 2.0-compliant.

Spec Max speed Products Name


USB 1.1 1.5 MB/s or 12 Mb/s 1998 USB
USB 2.0 60 MB/s or 480 Mb/s 2001 Hi-speed USB

USB ports on ThinkPad notebook Basic-speed USB logo

Hi-speed USB logo

USB ports on ThinkCentre desktop


© 2008 Lenovo

Universal Serial Bus (USB) Speeds


The main USB version was 1.1 from 1998 through 2002; USB 2.0 became available in 2001. USB
2.0 uses the same cables, connectors, and software interfaces and is backward compatible with
older USB 1.1 devices.
USB support is built into most south bridge chipsets such as the Intel I/O Controller Hub. For older
systems that have USB 1.1 but require USB 2.0, various vendors sell PCI adapters or PC Cards
with USB 2.0 controllers.
While the basic data rate of USB 1.1 is 12 Mb/s (1.5 MB/s) with shielded twisted pair cable, USB
supports a low-speed subchannel of 1.5 Mb/s. Devices on this subchannel can use unshielded cable
that is not twisted. All cables use four wires (two for power, two for data); the distance between
two devices can be up to five meters. The maximum cable length is 3 meters.
USB 2.0 has a data transfer rate of 480 Mb/s (60 MB/s) which is 40 times the speed of USB 1.1.
You can mix USB 1.1 and 2.0 devices under USB 2.0, and all devices will perform at their
maximum speeds without interference; USB 1.1 devices will run at their slower speeds. Noisy,
poorly shielded cables can cause problems so it is best to use cables labeled “high-speed USB 2.0
compatible” for USB 2.0 devices. The maximum cable length is 5 meters.
The USB 2.0 specification encompasses all USB data transfer speeds of low (1.5 Mb/s), full (12
Mb/s), and high (480 Mb/s). The correct nomenclature for high-speed USB products is Hi-Speed
USB. The correct nomenclature for low- or full-speed USB is simply USB.
Remember that these speeds are theoretical, and, in actual use, the transfer speeds will be much
less.

PC Architecture (TXW102) September 2008 54


Topic 4 - Bus Architecture

Universal Serial Bus:


Wireless USB (WUSB)

• Same function and architecture as


Main devices for WUSB:
USB 2.0 except without cables
• Digital camcorder
• Uses ultra-wideband (UWB) radio
technology by WiMedia Alliance • HDTV
• Personal video recorder
• Primarily for connection of home
• MP3 player
consumer products to PC or TV
• Set top box
• Certified Wireless USB
• Game console
• Utilized in select ThinkPad T61 • Printer
notebooks
• Portable projector
• Digital camera
USB 2.0 Wireless USB (WUSB)
• External disk
480 Mb/s 480 Mb/s at 3 meters
Requires cables No cables
Products in 2001 Products in 2007

Certified Wireless USB Logo

© 2008 Lenovo

Wireless USB (WUSB)


Wireless USB (WUSB) is similar to USB 2.0 except without wires. WUSB is designed for devices
such as a printer, external hard disk, digital camera, etc., to connect to a PC or a TV without
cabling.
Wireless USB is based on the Ultra-Wideband (UWB) radio efforts by the WiMedia Alliance
(www.wimedia.org). Both are open industry associations that promote personal-area range
wireless connectivity and interoperability among multimedia devices in a networked environment.
Wireless USB is often abbreviated as "WUSB", although the USB Implementers Forum
discourages this practice and instead prefers to call the technology "Certified Wireless USB" to
differentiate it from competitors. Products are expected in 2007.
The fundamental relationship in WUSB is a hub and spoke topology. In this topology, the host
initiates all the data traffic among the devices connected to it, allotting time slots and data
bandwidth to each device connected. These relationships are referred to as clusters. The
connections are point-to-point and directed between the WUSB host and WUSB device.
Wireless USB helps to eliminate the numerous cables that attach to USB connectors on PCs. Also,
users do not need to deal with the non-standard mini-USB connectors used on cameras and other
devices.

PC Architecture (TXW102) September 2008 55


Topic 4 - Bus Architecture

Where Wireless USB Fits

Wireless PAN Bluetooth Wireless USB

Wireless LAN Wi-Fi 802.11


Wireless metro-area
802.16
network
Cellular 2G/2.5G/3G
0.01 0.1 1.0 10 100 1,000
Source: UWB Forum Data rate (measured in Mb/s)

External
hard drive
Video
camera

Printer
Wireless
USB host

Wireless
USB hub
Digital
camera
Certified Wireless USB uses a hub-and-spoke model where a
wireless USB hub and devices with integrated wireless USB
can communicate with a single host.

Wireless USB technology will support the following attributes:


• Simple, low-cost implementation. The implementation will follow the wired USB connectivity
models as closely as possible to reduce development time and to preserve the low-cost, ease-of-
use model which has become pervasive in the PC industry.
• A point-to-point connection topology supporting up to 127 devices that follows a similar host-to-
device architecture as used for wired USB.
• The WUSB host can logically connect to a maximum of 127 WUSB devices, considered an
informal WUSB cluster. WUSB clusters coexist within an overlapping spatial environment with
minimum interference, thus allowing a number of other WUSB clusters to be present within the
same radio cell.
• High spatial capacity in small areas to enable multiple devices access to high bandwidth
concurrently. Multiple channel activities will be able to occur within a given area. The topology
will also support multiple clusters in the same area. The number of clusters to be supported is yet
to be determined.

PC Architecture (TXW102) September 2008 56


Topic 4 - Bus Architecture

• A dual-role model where a device can also provide limited host capabilities. This model would
allow mobile devices to access services with a central host supporting the services (i.e., printers
and viewers). It would also allow devices to access data outside a cluster they are connected to by
creating a second cluster as a limited host.

Performance
Wireless USB performance at launch will provide adequate bandwidth to meet the requirements of
a typical user experience with wired connections. The 480 Mb/s initial target bandwidth is
comparable to the current wired Hi-Speed USB standard. With 480 Mb/s as the initial target, the
Wireless USB specification will allow for generation steps of data throughput. As the Ultra-
Wideband (UWB) radio evolves and future process technologies take shape, bandwidth could
exceed 1 Gb/s. The specification intends for Wireless USB to operate as a wire replacement with
targeted usage models for cluster connectivity to the host and device-to-device connectivity at less
than 10 meters. The 480 Mb/s target is for a range of 3 meters; a range of 10 meters will likely
achieve about 110 Mb/s. It operates in the 3.1 to 10.6 GHz frequency range.

Security
Wireless USB security will be designed to deliver the same level of security as wired USB.
Connection-level security between devices, for instance, will be designed to ensure a device is
associated and authenticated before operation of the device is permitted. Higher levels of security
involving encryption will be implemented at the application level. An important goal will be to
ensure that processing overhead supporting security does not impose noticeable performance
impacts or device cost.

PC Architecture (TXW102) September 2008 57


Topic 4 - Bus Architecture

FireWire (IEEE 1394)

• Bus to connect multiple I/O devices with small cables


• Common in digital video (DV) editing hardware
- Consumer electronics
- Transfer from DV camcorder to disk without
loss of quality
• Supports up to 63 devices; 4- or 6-wire cable,
hot-pluggable devices; peer-to-peer architecture
FireWire Connectors
• Will not be widespread until Intel includes support in
chipsets; common for Apple products
• Lenovo sells FireWire PCI and CardBus Cards
• Called iLink by Sony
• External disks attached to FireWire ports are common

Marketing name Standard Max speed


FireWire 400 1394a-2000 400 Mb/s
FireWire 800 1394b 800 Mb/s FireWire on a Digital Camcorder

© 2008 Lenovo

FireWire (IEEE 1394)


The IEEE 1394 specification, commonly called FireWire by Apple, iLink by Sony, and High
Performance Serial Bus (HPSB), provides a standardized high-speed serial bus to link computer and
consumer peripherals to PCs through a common, high-speed serial port. It can connect 1394-
compatible peripherals such as disks, optical drives, printers, and scanners, as well as digital consumer
electronic products such as digital cameras, camcorders, stereo systems, set top boxes, HDTVs, DVD
players, and speakers. Its primary use is to move video files between a Digital Video (DV) camcorder
and a PC disk without any loss of visual quality.
The 1394 standard was first developed by Apple and Texas Instruments and is a formal IEEE standard.
Following are the various terms it is known by:
• 1394-1995 – Original IEEE-approved specification which allowed for 100 Mb/s, 200 Mb/s, or 400
Mb/s transfers.
• 1394a-2000 – This specification standardized and clarified parts of 1394-1995. It added features such
as short bus reset, accelerated arbitration, and power management. It supports speeds up to 400 Mb/s
and cable segments of 4.5 meters.
• 1394b or FireWire 800 – Approved in 2001, this specification supports 800 Mb/s (with a plan for 1.6
and 3.2 Gb/s). It is backward compatible with previous 1394 versions (so FireWire 400 devices work
with this version). It supports greater distances by supporting additional media. Over plastic fiber-
optic cable or Category 5 copper wire, 1394b networks can span 100 meters, up from 4.5 meters in
previous 1394a specification. Over glass fiber, 1394b can transmit over several kilometers. Apple
released products based on this spec in 2003.
Legacy IEEE 1394 cables normally come in a 6-pin configuration with two twisted signal pairs and a
negative/positive power pair, capable of providing 50 watts of DC for powering

PC Architecture (TXW102) September 2008 58


Topic 4 - Bus Architecture

peripherals (USB 2.0 provides only 1.5 watts). The smaller 4-pin connectors found in cameras and
other self-powered peripherals omit the power. The new 9-pin IEEE 1934b cables use two pins to
attach a grounded shield that surrounds the other wires and prevents interference from outside
electromagnetic noise, which helps speed up data transmission rates by reducing crosstalk. The
third new pin is unused. (The 1394b committee decided to use a pre-existing cable technology and
didn't want to force cable makers to tool up for eight conductor cables and connectors.) IEEE
1394b is backward compatible, so there are also 9-pin-to-6-pin and 9-pin-to-4-pin adapter cables.
You will not get the higher speed of FireWire 800 if you attach legacy peripherals to what is called
a bilingual port (it speaks both IEEE 1394a and b). The newer b-only, or beta, ports don't support
older IEEE 1394 devices. Due to slight physical differences between b-only and bilingual
connectors, a bilingual port can take both FireWire 800 and older FireWire connectors, while a b-
only port will take only a FireWire 800 connector.

FireWire Architecture
The 1394 bus is a scalable architecture with speeds up to 800 Mb/s (100 MB/s), and work is
underway for a faster speed. The speed depends on vendor implementation. A 1394 controller will
be able to be attached to the PCI bus. One or more 1394 ports may be on the PC. PCI adapter cards
with 1394 controllers and ports are available also.

Digital Video Hard


Scanner Printer Camera VCR Camera DVD±RW Drive

Firewire Devices Plug into a FireWire Connector


on the Systemboard, Adapter Card, or PC Card

1394 uses a tree topology; it supports up to 63 nodes on up to 1,023 buses. A PC can address a
single node, broadcast to all nodes, and bridge across buses to as many as 64,000 nodes. 1394
automatically configures peripherals, which are hot pluggable. Addresses are automatically
assigned, so no switches are set. Time required to reconfigure is not more than 400 microseconds.
Each 1394-compatible device will have a unique 64-bit ID number in order for FireWire to identify
it. More than one PC can be connected to the 1394 bus. Devices do not need terminators, special
sequencing, or user awareness of device identifiers. 1394 is a peer-to-peer bus, not a master/slave
bus, as SCSI and Universal Serial Bus (USB) is.
An advantage to FireWire is that it does not need a computer host, nor does it need to signal the
other component that it’s "alive", as current USB implementations must. This kind of data
interruption makes USB impractical for most professional video work. FireWire uses fewer
resources than USB. FireWire can easily coexist with the USB on the same system.
1394 supports synchronous and isochronous (real time) communications. Isochronous support
guarantees on-time delivery, which is necessary for ensuring proper synchronization of audio and
video. Traditional methods of handling digital video require a video capture adapter to convert
analog NTSC/PAL-compatible signals to digital format. 1394 supports

PC Architecture (TXW102) September 2008 59


Topic 4 - Bus Architecture

direct connection of digital devices. USB does not handle isochronous data well.

PC
s
b/
M
0
t 20 FireWire
a
er Hub
nsf
a 200 Mb/s
Tr

HDD
400 Mb/s

The FireWire bus transfers at different speeds between devices. Each device transfers at the speed
of the slowest between the two, but not any faster than the the speed of an upstream device.
Windows XP has limited support for 1394 to the 100 Mb/s speed. Microsoft's Windows Vista
operating system will initially support 1394a only; 1394b support will be provided in a service
pack.
The FireWire Trade Association can be reached at www.1394ta.com.

4-Pin FireWire (“Mini-B Port”) 6-Pin FireWire Connector (“A” Port)


Connector (Additional Two Pins Carry Power)

9-Pin FireWire 800 Connector

PC Architecture (TXW102) September 2008 60


Topic 4 - Bus Architecture

PC Card:
Introduction

• Personal Computer Memory Card International


Association (PCMCIA)
• Current version is PC Card Standard 8.0
- 16-bit PC Cards
- CardBus
• PC Card devices are credit card-size adapters
• PC Card advantages
- Small physical size
- Portability between systems 11a/b/g Wireless PC Card
- Hot pluggability
- Less vulnerable to environmental variables

© 2008 Lenovo

PC Card Introduction
PCMCIA is the name of a committee of hardware and software vendors (about 500 including
Lenovo) that defines standards for expansion cards that are primarily for portable computers. PC
Cards (as they have come to be called) have roughly the dimensions of credit cards, with varying
thickness, making them suitable for battery-powered notebook computers.
As the name implies, the original intent of the committee was to define a standard specifically for
memory cards such as SRAM memory. Its role has expanded considerably in that I/O cards such as
modems, LAN adapters, and disks have been developed to conform to this standard. In addition to
I/O cards, audio cards with DSPs, global positioning cards, wireless LAN cards, pagers, and other
types have been developed. The ultimate goal of PCMCIA is to provide a set of hardware and
software standards that will allow for the hot pluggability and interoperability of any PC Card in
any computer conforming to the published standard.
There is a hierarchical software structure that must be adhered to in order for PCMCIA to achieve
its promise of hardware independence and interoperability. This structure was finalized in the
November 1992 PCMCIA Standards Release 2.0. Release 2.1 became effective July 1993. PC Card
Standard does not have a version; was originally referred to as 5.0. The PC Card Standard was
published in early 1995, but controllers adhering to this spec were not available until 1996.
See www.pcmcia.org for more information.

PC Architecture (TXW102) September 2008 61


Topic 4 - Bus Architecture

PC Card:
Physical Design

Type I Type II Type III


3.3mm width 5.0mm width 10.5mm width
Memory I/O Storage

Type II Example:
5 mm

54 mm

85.6 mm

© 2008 Lenovo

PC Card Physical Design


• The PC Card slots are designed to withstand 10,000 PC Card insertions.
• Type I are primarily flash memory cards.
• Type II are primarily I/O devices interface cards.
• Type III are primarily rotating hard disks.
• All PC Cards have 68-pin connectors.
• A Type I Extended is 3.3mm by 54mm with up to 40mm of extended length.
• A Type II Extended is 5.0mm by 54mm with up to 40mm of extended length. It is appropriate for
pagers and wireless communication adapters with antennas.
• One Type I (3mm) slot can utilize:
– One Type I (3.3mm)
• One Type II (5mm) slot can utilize:
– One Type I (3.3mm) or one Type II (5mm)

Open Type II slot for Type I device or Type II device

PC Architecture (TXW102) September 2008 62


Topic 4 - Bus Architecture

• One Type III (10mm) slot can utilize:


– One Type 1, one Type II, or one Type III

Type II
Type I Type III

Open Type III slot

• Two Type II slots can utilize:


– Two Type 1, two Type II, or one Type III

Type I Type II
Type II

Two open Type II slots Type III

PC Card / PCMCIA
• PC Card slots are standard in all ThinkPad notebooks.
• Operating systems include utilities to make installing, configuring, and monitoring PC Cards
extremely easy.
• PC Card Standard and CardBus will increase the performance and use of PC Cards.
• Lenovo is a major PC Card vendor.
• Lenovo is on the PCMCIA board so the company is a major influencer of the standard.
• At its January 1996 meeting, the Personal Computer
Memory Card International Association (PCMCIA)
approved a system of icons designed to help users
easily identify the basic capabilities of a PC Card PCMCA Member DMA Support 3 Volt
or PC Card slot. Several manufacturers have
expressed that they will be implementing the icons
on their products. The icons are the same shape as
the PCMCIA's "PC Card" logo, but contain 16 Bit Zoomed Video 5 Volt
information about a product such as operating
voltage, bandwidth, and optional capabilities such
as CardBus and DMA. The use of the icons is
restricted to the members of PCMCIA and is CardBus Digital Video
completely voluntary. Broadcasting

PC Architecture (TXW102) September 2008 63


Topic 4 - Bus Architecture

PC Card:
PC Card Standard

• Specification following PCMCIA 2.1


(no version number, but sometimes referred
to as PC Card Standard 8.0)
• 16-bit data path
• 3.3 and/or 5.0 volt operation
• Improved flexibility for power management
• Optional: DMA channel, multifunction cards
• Existing (older) PCMCIA cards will work in
PC Card Standard sockets
• A superset of PC Card Standard is CardBus

PC Card Slots on a Lenovo ThinkPad

© 2008 Lenovo

PC Card Standard
The PC Card standard is designed to address compatibility problems prevalent in PCMCIA and
facilitate higher performance, lower power, and broader platform applicability. The new standard
includes a number of additional features, forcing network administrators to carefully consider the
features they need and the appropriate products. PC Cards and PC Card-compatible platforms that
worked under the previous standard will also work with the new standard, allowing users a great
deal of flexibility. However, individuals seeking to take advantage of the new features may require
a new or updated PC Card. The new standard includes several features meant to broaden
applications for PC Card technology.
By adding DMA to the standard, PC Cards can have on-board DMA channels which are ideally
suited to devices that frequently generate large amounts of data such as LAN PC Cards.
The previous PCMCIA 2.1 had limitations such as 16-bit transfers, no busmastering support, no
burst or streaming modes, and no data or address parity.
In late 1999, PC Card Standard 8.0 was released, which added a variety of new features including a
new Small PC Card form factor and PCI-style Power Management specification for CardBus. See
www.pcmcia.org for more information. As of late 2003, the latest release is version 8.1.

PC Architecture (TXW102) September 2008 64


Topic 4 - Bus Architecture

PC Card:
CardBus and CardBus Plus

CardBus
• Superset of PC Card Standard
• 32-bit transfers based on PCI specification
• 33 MHz with 132 MB/s transfers
• Backward compatible with existing 16-bit
PC Cards
• Standard in ThinkPad systems
10/100 Ethernet
• CardBus PC Cards identified by brass-looking CardBus PC Card
plate with bumps near connector

CardBus Plus
• Includes USB signal to slot
• Enables support for the USB-based
ExpressCard when placed in a special adapter
CardBus Plus is used in
select ThinkPad Notebooks
© 2008 Lenovo

CardBus
The CardBus system has been driven primarily to be compatible with the Peripheral Component
Interconnect (PCI) local bus. CardBus provides for busmastering and allows for a 32-bit data path
by multiplexing address and data. The CardBus requires shielding to the system and card
connectors to achieve the 33 MHz performance. (The original PC card standard had a theoretical
maximum of 10 MHz but cards were typically run at 5 MHz.) This performance, combined with
the 32-bit data capability, provides a transfer rate of 132 MB/s.
• CardBus maintains backward compatibility with existing 8- and 16-bit cards (for PCMCIA
Release 2.0/2.1).
• CardBus uses 68-pin format with additional shielding.
• Vendors can design their products with CardBus-only sockets or CardBus-only PC Cards.
CardBus PC Cards are usually identified visually by a brass-looking-gold-colored plate that often
has eight bumps on it near the connector. This functions as a grounding shield for the higher speeds
across the interface.

PC Architecture (TXW102) September 2008 65


Topic 4 - Bus Architecture

CardBus PC Cards have different address space; therefore, they do not have attribute memory
space. Instead, CardBus PC Cards have a special 256-byte configuration space that 16-bit PC Cards
do not have. A pointer to the start of the card information structure (CIS) is stored inside the
configuration space. The CIS itself may reside in any address space on the CardBus card except in
its I/O space.
CardBus PC Cards also do not have configuration registers. In their place is a setup of up to eight
base address registers (BARs). Each BAR is capable of responding to either I/O or memory access
in the host system address space.
CardBus PC Cards may support up to eight separate functions on a single card. Each of these
functions is treated almost as if it were a separate card. Each function has its own configuration
space, CIS, and BARs.
CardBus PC Cards no longer map memory or I/O. Instead, the CardBus PC Card directly decodes
host system access. The down side is that it requires additional logic to handle the decoding.
However, the card now controls the number of address spaces available for both memory and I/O.
On systems and cards that rely on the system adapter to perform decoding, the number of memory
windows is constrained by the adapter. The limit today is typically five memory and two I/O
windows per socket.
While CardBus still relies on client drivers working in conjunction with Card and Socket Services,
modifications have been made to the metaformat and Card and Socket Services while maintaining
compatibility with the 16-bit client drivers. Most notably, in order to support the multi-function
capability of these cards, both Card and Socket Services have been modified. These changes do not
have an impact existing 16-bit client drivers. Rather, a CardBus client driver exploits these
enhanced functions.
All major operating systems support CardBus.
In late 1997, the PCI Bus Power Management Interface Specification from PCMCIA was
approved. This integrates CardBus PC Card power management with ACPI.
CardBus host controllers are a type of PCI bridge and are supported (enumerated and configured)
by the PCI software in operating systems just like other PCI bridges.
• 100 Mb/s Ethernet CardBus PC Card are about 50% faster than an equivalent 16-bit PC Card and
about 70% faster than a 16-bit 10 Mb/s Ethernet PC Card
• 10 Mb/s Ethernet CardBus PC Cards are about 10% faster than an equivalent 16-bit PC Card

CardBus Plus
CardBus Plus was introduced in 2005 in some notebook systems (such as the ThinkPad X41
notebook). CardBus Plus is a normal CardBus slot with a USB 2.0 signal passed to it so that USB-
based ExpressCard modules can be used in a PC Card slot with an adapter. This permits a single
slot to support the older CardBus PC Cards and some new ExpressCard modules. Note that the
ExpressCard module must be placed in a special adapter and must be USB-based, not PCI Express-
based.
CardBus Plus is used in select ThinkPad notebooks.

PC Architecture (TXW102) September 2008 66


Topic 4 - Bus Architecture

Chipsets

• A group of hardware chips (circuitry) controlling


information flow between subsystems
• Determine features and performance of systems
- Type and speed of memory
- Type and speed of I/O
• New versions continually released by Intel, NVIDIA,
ATI, and others
• Generally associated to specific processor

P965 Express Chipset

© 2008 Lenovo

Chipsets
Chipsets are physical hardware chips (circuitry) that control the information flow between
subsystems. Also known as core logic, the chipset of a system is generally as important as the
processor type in that the chipset determines the features and performance of a system.
Intel is the main vendor for chipsets, but other vendors market competitive chipsets. Chipsets are
associated with specific processors, and new chipsets are continually released that provide new
features and performance enhancements.
Intel was calling its chipsets PCIsets because they contained controllers for the PCI bus, but when
accelerated graphics port (AGP) was introduced, the name for chipsets supporting AGP graphics
became AGPset. As PCI is becoming less of a major focus, the generic term chipset is now
common.

Q965 (right) and ICH8 (left) chipset on Lenovo ThinkCentre systemboard

PC Architecture (TXW102) September 2008 67


Topic 4 - Bus Architecture

Chipsets:
Old and New PC Architectures

Older desktop architecture (2000-2004) Newer desktop architecture (2004 and after)
• PCI bus and Hub Interface • PCI Express and Direct Media Interface
• AGP for graphics • PCI Express x16 graphics (no AGP)
• EIDE disks • Serial ATA disks
Processor + Processor +
L1/L2 cache L1/L2 cache
AGP slot or Memory and opt PCI Express Memory and opt
display cache graphics cont X16 slot graphics cont
MCH or MCH or
PCI Slots GMCH PCI Express slots GMCH
host bridge Memory host bridge Memory
Hub Direct
Interface Media
PCI controller Interface PCIe controller
I/O IDE controller I/O PCI controller
PCI bus controller SATA controller PCIe Controller SATA controller
hub DMA controller Hub IDE controller
(ICH) (ICH)
4 IDE disks USB controller USB controller
4 SATA disks
Super I/O USB Super I/O USB 2.0
Firmware AC '97 codecs Firmware AC '97 codecs
hub hub or
Low Pin (Box) Low Pin
High Definition Audio
Count interface Accel hub arch Count interface

© 2008 Lenovo

Old and New PC Architectures


PC architectures are defined by the chipsets. Chipsets have changed significantly over the years.
Around 2000, Intel introduced the Accelerated Hub Architecture (originally in the 8xx chipsets)
which had three main chips which were the Memory Controller Hub (MCH), I/O Controller Hub
(ICH), and Firmware Hub. A high-speed Hub Interface connected the MCH and ICH together, and
the PCI bus became less important from the 1990s as it was moved to connect to the ICH.
In 2004, Intel still utilized the Memory Controller Hub (MCH) and I/O Controller Hub (ICH), but
the interface between them become the Direct Media Interface (DMI). More importantly, PCI
Express (the successor of PCI and Accelerated Graphics Port [AGP]) was introduced. For graphics,
a PCI Express x16 slot could use a high-end graphics adapter. PCI Express slots and integrated
devices could also be utilized off the MCH. While EIDE disks could be used, Serial ATA became
the primary disk interface.

PC Architecture (TXW102) September 2008 68


Topic 4 - Bus Architecture

Accelerated Hub Architecture (from older desktop architecture)


When evaluating the task of creating a chipset architecture capable of scaling with tomorrow's
microprocessors and multimedia/Internet applications, the Intel designers saw the PCI bus as a key
limiting factor. The Intel 810 chipset was the first in a series of Intel chipset products to be based
on the Accelerated Hub Architecture, which removes the PCI bus as the main device interconnect.
This architecture provides each critical multimedia subsystem with a direct link to the chipset. For
example, data can now move directly from an IDE storage device to memory through a 266 MB/s
I/O channel without PCI bus contention or bandwidth limitation. The dedicated links to IDE, audio,
modem, and USB subsystems ensure deterministic access to/from memory providing improved
performance, optimal concurrence, and previously unattainable audio/video isochrony.
The Accelerated Hub Architecture refers to the three chips that are in the box in the associated
slide. The chips include the graphics memory controller hub (GMCH), I/O Controller Hub (ICH),
and firmware hub.

Hub Interface (from older desktop architecture)


The bus or link between the GMCH/MCH and ICH is called the hub interface or hub link. There
are different versions or generations of this link.
One version is the Hub Interface 1.5, which implements data transfers at 266 MB/s with an
effective transfer rate of 266 MHz, and it is a point-to-point bus type. The strobe signal is clocked
at up to 133 MHz and transfers data through the use of both clock edges. It is an Intel-proprietary
bus; therefore, Intel will not release specifications or protocols regarding it. It is a 16-bit bus with
only eight bits carrying data. The other bits include two for differential strobe, three for protocol
control, one for trace matching, and one for signal voltage reference. There is an optional parity
signal for applications that require it. The interface can be scaled in both width and clock
frequency. Its bandwidth and architecture allows the chipset to deal effectively with isochronous
data without the bottlenecks associated with the PCI protocols. This bandwidth is achieved through
the use of high clock rates, differential signaling on the strobe line, a reference voltage signal that
provides reliable switching, and a compensation signal that allows the chip to match its buffer
characteristics to those of the printed circuit board. The bus is source-synchronous – the clock and
data are driven at the source, ensuring synchronous arrival. The combination of source-
synchronous clocking and the voltage and matching signals guarantees electrical integrity, but a
parity option is available for those applications that require signal checking throughout the system,
such as high-integrity servers. It transfers four samples per clock (so it is like a quad-pumped bus).
It uses 64-bit inbound addressing and 32-bit outbound addressing.
Another version is Hub Interface 2.0 which has the same features as 1.5 except it has 16-bit wide
data transfer (versus 8-bit) with an 8x clock (versus 4x clock) which provides 1.066 GB/s
bandwidth. It has ECC protection on the link (versus parity).

PC Architecture (TXW102) September 2008 69


Topic 4 - Bus Architecture

Firmware Hub
The firmware hub stores system BIOS and video BIOS in flash memory, eliminating a redundant
nonvolatile memory component. Also, it contains a hardware random number generator (RNG),
which provides random numbers to enable fundamental security building blocks supporting
stronger encryption, digital signing, and security protocols. The hardware RNG is more effective
than a software-based RNG. The device uses thermal noise in a semiconductor junction to produce
random circuit transitions. These transitions are aggregated and checked for true randomness, then
assembled into a random key of any desired length. Software uses this to provide security in
applications.

AC ’97 Audio
With the I/O Controller Hub (ICH) and its follow up versions (ICHx), audio is implemented in
software as soft audio and is effectively moved off the PCI bus. Soft audio is a software solution
that combines a low-cost audio codec integrated circuit (IC) with a small portion of the core
chipset's processing power to form a complete PC audio subsystem. Soft audio processing
consumes minimal CPU overhead and eliminates the typical PCI audio controller from the system.
The result is reduced systemboard space and overall system cost. The diagram that follows
illustrates the transition from legacy audio to soft audio. A sound card such as the Creative Labs
Sound Blaster Audigy 2 PCI adapter does provide the highest quality for musicians, gamers, and
DVD movie buffs. However, Analog Devices SoundMAX 2.0 and 3.0 implement a software-based
sound solution at a $2 USD cost compared to almost $100 USD for a PCI sound card.

PC Architecture (TXW102) September 2008 70


Topic 4 - Bus Architecture

Transition to Soft Audio Solutions


Legacy Audio PCI Audio Soft Audio Solutions

ICHx or
South Bridge
ISA PCI Audio
AC-Link
Single Accelerator
Chip
AC-Link AC '97 Codec
Audio
Sigma Tel
AC '97 Codec
• Increasing CPU Performance
Moving to “Soft” Audio • Lower Cost
• Less Area

The SoundMAX includes soft audio (processed in software) from Staccato Systems which is
responsible for high-performance features such as wavetable music synthesis, EAX environmental
effects, and a customizable 3-D positioning environment.
The ICHx performs many I/O control functions, and its audio controller manages audio stream
transfers between system memory and audio codecs, leaving the CPU and PCI bus free to service
other system requests. Included in the ICHx is a controller that has a digital serial link (AC link).
The digital serial link can communicate with codecs like the Analog Devices AD188x audio IC
series found in the SoundMax 2.0 technology.
In 2001, Analog Devices announced SoundMAX with SPX (or SoundMAX 3.0) as a follow on to
SoundMAX 2.0. SoundMAX 3.0 with SPX (Sound Production eXtensions) is a low-cost,
integrated audio solution that surpasses the sound of many expensive PCI-based sound adapters.
Residing on the PC’s systemboard or on a Communications Network Riser (CNR) soundcard,
SoundMAX consists of high-performance hardware codecs, featuring hardware sample rate
conversion (SRC) and professional quality 94-dB playback. SoundMAX software includes
Windows device drivers and applications that supports 3D audio, DirectX, EAX, A3D, unlimited
voice DLS-2 wavetable, 5.1 virtual theater surround, CNR multi-channel output options, and SPX
“audio animation” technologies.
Audio Codec '97 (AC '97) defines a high-quality, 16-bit audio architecture for a PC. The latest
specification is Audio Codec '97 Component Specification v2.3, which can be located at
www.developer.intel.com/ial/scalableplatforms/audio/ on the Internet. The specification is updated
to reflect support for an audio riser, support for numerous codec/audio upgrades, and pertinent
material concerning S/PDIF (Sony/Philips Digital Interface). S/PDIF is a digital audio interface that
is widely used in consumer electronics, specifically all Dolby digital equipment, DVD players,
many CD players, middle-and high-end sound cards, and electronic musical instruments.

PC Architecture (TXW102) September 2008 71


Topic 4 - Bus Architecture

AC ’97 System Diagram


Bus sources (digital) Analog sources (legacy)
• audio apps • CD/DVD: Redbook audio
• games • VIDEO*: TV tuner
• digital CD/DVD audio • AUX*: internal source
• soft MPEG, AC-3, etc. Optional hw acceleration
• digital music, MP3 SRC*, mix*
3D positional* LINE IN
Wavetable synth*
PCI = = LINE OUT
== AUX OUT
bus = =
AC ’97 AC-link LNLVL_OUT
CPU Digital AC-link or HP_OUT*
or 4CH_OUT*
Controller &
SPDIF OUT*
control
MIC IN
AC ’97 AC ’97
audio modem PHONE*
Codec Codec MONO OUT*
Key
analog mono Speakerphone I/O
(legacy)
analog stereo AC-link
* optional
OEM riser slot & card (optional)

The “AC ’97 System Diagram” above shows the essential features of a typical ’97 system design.
The AC ’97 Codec performs DAC and ADC conversions, mixing, and analog I/O for audio (or
modem), and always functions as slave to an AC ’97 Digital Controller, which is typically either a
discrete PCI accelerator or a controller that comes integrated within core logic chipsets.
The digital link that connects the AC ’97 Digital Controller to the AC ’97 Codec, referred to as
AC-link, is a bi-directional, 5-wire, serial TDM format interface at 48kHz. AC-link supports
connections between a single Controller and up to 4 Codecs on a circuit board or riser card.
The system diagram illustrates many of the common PC audio connections, both digital and analog.
PC audio today is rapidly moving towards a Digital Ready architecture that requires all audio
sources must be available in digital form, but a number of legacy analog sources still require the
support of an analog mixing stage.
The AC ’97 architecture supports a variety of audio output options, including:
• Analog stereo output (LINE_OUT) transmitted to amplified stereo PC speaker array via stereo
mini-jack.
• Amplified analog stereo headphone output (HP_OUT) transmitted to headphones or headset via
stereo minijack.
• Discrete analog 4-channel output (LINE_OUT plus 4CH_OUT) transmitted to Front and
Surround amplified stereo PC speaker arrays via dual stereo mini-jacks.
• Analog matrix-encoded Surround output (such as Dolby ProLogic) transmitted via stereo line
level output jack (LNLVL_OUT) to consumer A/V equipment that drives a home theater multi-
speaker array.
• Digital 5.1 channel output (such as Dolby Digital AC-3) transmitted via S/PDIF (SPDIF_OUT)
to digital ready consumer A/V equipment which drives a home theater multi-speaker array.

PC Architecture (TXW102) September 2008 72


Topic 4 - Bus Architecture

Chipsets:
Block Diagram with PCI Express

• Chipset components
- Memory Controller Hub (MCH) or Graphics Memory Controller Hub (GMCH)
- I/O Controller Hub (ICH)
• Used in latest Intel chipsets Processor +
L1/L2 cache
PCI Express
X16 slot PCIe
Memory
MCH or
GMCH
Direct Host Bridge PCIe Mobile
PCI Express slots Media docking
Interface
PCIe PCIe
ExpressCard
PCI 2.0 I/O USB 2.0
Controller PCIe
Hub Mini Card
PCI 2.0 slots 4 SATA
(ICH) USB 2.0
disks
PCIe Integrated
systemboard
USB 2.0 devices
Super I/O
Firmware AC '97 codecs
Low Pin Hub or
Count interface High Definition Audio

© 2008 Lenovo

Block Diagram with PCI Express


With the introduction of PCI Express in 2004, PCI Express links are in several places within a
system. PCI Express x16 can be used for the graphics via a PCI Express x16 slot. PCI Express x1
slots can be off the ICH. PCI Express x1 links are for the ExpressCard and Mini Card
implementations in notebook systems. PCI Express can be used to interface with external docking
stations for notebooks. Finally, PCI Express can be used for integrated devices on the systemboard.

Memory Controller Hub (MCH) or Graphics Memory Controller Hub (GMCH)


The host bridge is either a Memory Controller Hub (MCH), which is the memory controller, or the
GMCH, which includes the graphics controller and memory controller in a single chip. With a
GMCH, the graphics controller (along with the DAC) is integrated into the same physical chip as
the memory. With a GMCH, the graphics memory is the main memory, so no separate graphics
memory is necessary.

PC Architecture (TXW102) September 2008 73


Topic 4 - Bus Architecture

I/O Controller Hub (ICH)


The I/O Controller Hub (ICH) is a chip that consolidates many input/ouput functions needed by the
system. Many controllers for different subsystems are contained within the ICH such as the PCI
Express, PCI, Serial ATA, USB, AC ’97 or Intel High Definition Audio, and part of an Ethernet
controller.
See the next slide for more details of the ICH.
There are several generations of the ICH with each generation followed by a number such as ICH2,
ICH3, etc.

Direct Media Interface (DMI)


Direct Media Interface (DMI) is an Intel proprietary chip-to-chip connection between the Intel 9xx
MCH/GMCH and the ICH6/ICH7/ICH8 family. This high speed interface integrates advanced
priority-based servicing allowing for concurrent traffic and true isochronous transfer capabilities.
Base functionality is completely software-transparent permitting current and legacy software to
operate normally.
DMI closely adheres to the PCI Express specification, but is slightly different. It is based on a PCI
Express x4 link. DMI supports 2 GB/s (1 GB/s unencoded or 10 Gb/s encoded in each direction) of
bandwidth, using a 100 MHz differential clock. The 100 MHz reference clock is shared with PCI
Express x16 graphics adapter slot. DMI uses dual unidirectional lanes and has 32-bit downstream
addressing.
DMI provides improved Quality of Service (QoS) over the Hub Interface used with earlier ICH
implementations. In order to provide for true isochronous transfers and configurable QoS
transactions, the latest Intel ICHs support two virtual channels on DMI: VC0 and VC1. These two
channels provide a fixed arbitration scheme where VC1 is always the highest priority. VC0 is the
default conduit of traffic for DMI and is always enabled. VC1 must be specifically enabled and
configured at both ends of the DMI link.

x4 port

RX Point-to-point TX

Differential low
voltage

TX Software RX
transparency

Direct Media Interface (DMI)

PC Architecture (TXW102) September 2008 74


Topic 4 - Bus Architecture

Low Pin Count Bus Interface (LPC)


To enable a system without an ISA bus, the Low Pin Count (LPC) bus interface is used for super
I/O functions (diskette, serial, parallel, infrared, keyboard, mouse), audio, BIOS, and system
management.
The LPC bus interface specification 1.0 describes the LPC bus interface, which allows the legacy
I/O systemboard components, typically integrated in a Super I/O chip (a controller for diskette,
keyboard, mouse, serial, parallel, infrared) to migrate from the ISA or X-bus to the LPC bus
interface while retaining full software compatibility. The bus consists of a 33 MHz PCI-based 4-bit
bus with data transferred serially (over a 1-bit wire) and has a bandwidth of 4 MB/s with a
multidrop bus type. Typically the following class of devices are connected to the LPC bus
interface:
• Super I/O (diskette, serial, parallel, infrared, keyboard, mouse)
• Audio
• BIOS
• Systems management controller
• Security chip

Systemboard of ThinkCentre A51p with Systemboard of ThinkCentre A51p with


Pentium 4 and 915G Express Chipset Pentium 4 and 915G Express Chipset
(heatsinks on) (heatsinks off)

PC Architecture (TXW102) September 2008 75


Topic 4 - Bus Architecture

Serial Peripheral Interface (SPI)


The ICH7 family and later I/O Controller Hubs use the serial peripheral interface (SPI). SPI is an
alternative for the BIOS flash device. An SPI flash device can be used as a replacement for the
Firmware Hub.

Intel ICHx

SPI BIOS

Other ASICs LPC interface


(optional)
Super I/O
TPM
(optional) Firmware Hub

PC Architecture (TXW102) September 2008 76


Topic 4 - Bus Architecture

Chipsets:
I/O Controller Hub

• Chip with Intel notebook and desktop chipsets


• Consolidates many input/output (I/O) functions
- PCI and PCI Express
- Serial ATA
- Ethernet controller (MAC layer; requires PHY layer)
• Specific versions for each platform (desktop or mobile)

PCI Express slots

PCIe PCIe
ExpressCard
PCI 2.0 I/O USB 2.0
Controller PCIe
Hub Mini Card
PCI 2.0 slots 4 SATA
(ICH) USB 2.0
disks
PCIe Integrated
systemboard
USB 2.0 devices
Super I/O
AC '97 codecs
Low Pin or
Count interface High Definition Audio

© 2008 Lenovo

I/O Controller Hub


The I/O Controller Hub is a chip that consolidates many I/O functions needed by a system. The
ICH is included with the chipset of the system. It communicates to the (Graphics) Memory
Controller Hub and all its attached peripherals.

PC Architecture (TXW102) September 2008 77


Topic 4 - Bus Architecture

Chipsets:
ICH LAN Connect Interface (LCI)

• Intel I/O Controller Hubs have an integrated


ethernet controller
- Media Access Control (MAC) layer
• Requires additional physical layer (PHY) component
• Physical (PHY) layer is chip on systemboard or a
PCI (or proprietary) adapter
ICH7 chip

ICH2 through ICH7 ICH8 through ICH10


10/100 ethernet MAC Gigabit ethernet MAC

© 2008 Lenovo

ICH LAN Connect Interface (LCI)


Intel I/O Controller Hubs have a portion of an integrated ethernet controller in the chip known as
the LAN Connect Interface (LCI). This "half" of the controller which is the Media Access Control
(MAC) layer allows different solutions to be implemented in the other "half" which is the physical
layer component.
The following physical layer components could be the following:
• Managed 10/100 or Gigabit Ethernet with Alert on LAN
• Basic 10/100 or Gigabit Ethernet
The Physical (PHY) layer can be placed directly on the systemboard as a chip or a PCI (or
proprietary) adapter.
The ICH8, ICH9, and ICH10 implement a Gigabit Ethernet MAC layer, so this is called a Gigabit
LAN Connect Interface.

PC Architecture (TXW102) September 2008 78


Topic 4 - Bus Architecture

Chipsets:
Intel 3 Series Express Chipset Family (Desktop)

• G31, G33, G35, Q33, Q35, P31, P35, X38


• Core 2 Duo and Core 2 Quad support for desktops
• (Graphics) Memory Controller Hub with select features
- 1333 MHz system bus
- DDR3 memory
- GMA 3100 or Enhanced GMA X3500
- PCI Express x16 adapter Processor
• I/O Controller Hub 9 (ICH9) PCI Express
x16 slot Memory controller
• June 2007 PCI Express slots
(G)MCH Memory

Direct Memory
Media
Interface PCIe controller
I/O
PCIe Controller PCI controller
Hub SATA controller
(ICH) USB controller
6 SATA disks
Super I/O Serial USB 2.0
Peripheral High Definition Audio
Low Pin Interface
G33 Express Chipset
Count interface
© 2008 Lenovo

Intel 3 Series Express Chipset


In June and August 2007, Intel announced the Intel G31, G33, G35, Q33, Q35, P31, P35, and X38
Express Chipset (code-named Bearlake) for desktop systems. These chipsets were designed for
processors using the LGA775 socket such as the Core 2 Duo and Core 2 Quad processors. These
chipsets work with the various I/O Controller Hub family chips (mostly the ICH9 family)
Intel is using the following naming convention:
Q = stability, manageability, graphics, optimized for business
G = integrated graphics, video/3D, optimized for consumer
P = discrete graphics for performance support
X = extreme gaming

Intel P35 Express Chipset with I/O Controller Hub 9

PC Architecture (TXW102) September 2008 79


Topic 4 - Bus Architecture

All the chipsets support Intel Fast Memory Access which is a backbone architecture that improves
system performance by optimizing the use of available memory bandwidth and reducing the
latency of the memory accesses. Intel Fast Memory Access consists of these features:
– Just in Time Command Scheduling – Maximizes bandwidth by monitoring all pending
accesses to memory, allowing for the safe and efficient overlapping of commands on the
system memory bus.
– Out of Order Scheduling – This isolation functionality stops infected systems from affecting
others on the network by isolating clients and blocking outbound communication.
– Opportunistic Writes – Monitors system memory requests and issues pending write requests to
memory during idle times, allowing for a more efficient flow of data.
– Clock Crossing Optimizations – Ensures that data is transferred in a highly optimized manner,
enabling data transfer on the first usable clock phase encountered between the two frequency
domains.

Intel Q35 Express Chipset with heat sink on Intel Q35 Express Chipset without heat sink on
Lenovo ThinkCentre systemboard (red box) Lenovo ThinkCentre systemboard (red box)

Intel Q35 Express Chipset without heat sink on Lenovo


ThinkCentre systemboard (close-up view)

PC Architecture (TXW102) September 2008 80


Topic 4 - Bus Architecture

Chipsets:
Intel G31, G33, G35 Express Chipset Versions (Desktop)
G31 G33 G35
Mainstream PC Mainstream/performance Performance PC
Core 2 Duo, Core 2 Quad Core 2 Duo, Core 2 Quad Core 2 Duo, Core 2 Quad
DDR2-667, DDR2-800,
DDR2-667, DDR2-800 DDR2-667, DDR2-800
DDR3-800, DDR3-1066

800, 1067 MHz bus 800, 1067, 1333 MHz bus 800, 1067, 1333 MHz bus

1 DIMM/2 channels 2 DIMM/2 channels 2 DIMM/2 channels


4 GB max memory 8 GB max memory 8 GB max memory
GMA X3500
GMA 3100 GMA 3100
DirectX 10-compatible

None Clear Video Technology Clear Video Technology

PCI Express1.1 x16 slot PCI Express 1.1 x16 slot PCI Express 1.1 x16 slot

Dual Independent Display Dual Independent Display Dual Independent Display


with ADD2 or MEC with ADD2 or MEC with ADD2 or MEC
ICH7, ICH7R, ICH7DH ICH9, ICH9R, ICH9DH ICH8, ICH8R, ICH8DH
No iAMT support No iAMT support No iAMT support
June 2007 June 2007 August 2007

© 2008 Lenovo

Intel G31, G33, G35 Express Chipset Versions


This slide lists the key differences between the chipsets.

Intel G33 Express Chipset

PC Architecture (TXW102) September 2008 81


Topic 4 - Bus Architecture

Processor

800/1067/1333 MHz
Intel G33 Express Chipset
Analog
System memory
Display
VGA
Channel A
DDR2/DDR3

Display MEC G33 Channel B


GMCH DDR2/DDR3
SDVO
Graphics or
Display
Card PCI Express
x16 Graphics
DMI Controller
Interface Link
USB 2.0 Power management
(supports 12 ports,
dual EHCI controller) Clock generation

SATA (6 ports) System Management


(TCO)
Intel High Definition
Audio CODEC(s) Intel ICH9
SMBus 2.0/I2C
PCI Express x1
SPI Flash
GLCI
Intel Gigabit
Ethernet PHY Slots
LCI
PCI Bus
GPIO

LPC interface
Other ASICs
(optional)
Super I/O
Trusted
Platform Module Firmware
(optional) Hub

Intel G33 Express Chipset Block Diagram

PC Architecture (TXW102) September 2008 82


Topic 4 - Bus Architecture

Chipsets:
Intel Q33, Q35 Express Chipset Versions (Desktop)
Q33 Q35
Corporate Stable Corporate Stable
(Fundamental) (Professional)
Core 2 Duo, Core 2 Quad Core 2 Duo, Core 2 Quad
DDR2-667, DDR2-800 DDR2-667, DDR2-800

800, 1067, 1333 MHz bus 800, 1067, 1333 MHz bus

2 DIMM/2 channels 2 DIMM/2 channels


8 GB max memory 8 GB max memory

GMA 3100 GMA 3100

No Clear Video Technology No Clear Video Technology

PCI Express 1.1 x16 slot PCI Express 1.1 x16 slot

Dual Independent Display Dual Independent Display


with ADD2 or MEC with ADD2 or MEC
ICH9, ICH9R ICH9, ICH9R, ICH9DO
None IAMT support
August 2007 August 2007

© 2008 Lenovo

Intel Q33 and Q35 Express Chipset Versions


This slide lists the key differences between the chipsets.
Note: the Q35 can support Dual Independent Display without an ADD2 or MEC on the Lenovo
ThinkCentre M57 and M57p (BlueLeaf Ultra Small Form Factor type 6395).

Intel Q35 Express Chipset is used in select Lenovo ThinkCentre desktops

PC Architecture (TXW102) September 2008 83


Topic 4 - Bus Architecture

Chipsets:
Intel P31, P35, X38 Express Chipset Versions (Desktop)

P31 P35 X38


Mainstream PC Performance PC Gamer PC
Core 2 Duo, Core 2 Quad, Core 2 Duo, Core 2 Quad,
Core 2 Duo, Core 2 Quad
Core 2 Extreme Core 2 Extreme

DDR2-667, DDR2-800, DDR2-667, DDR2-800,


DDR2-667, DDR2-800
DDR3-800, DDR3-1066 DDR3-800, DDR3-1066
800, 1067 MHz bus 800, 1067, 1333 MHz bus 800, 1067, 1333 MHz bus
1 DIMM/2 channels 2 DIMM/2 channels 2 DIMM/2 channels
4 GB max memory 8 GB max memory 8 GB max memory
No GMA No GMA No GMA
No Clear Video Technology No Clear Video Technology No Clear Video Technology
2 x PCI Express 2.0
PCI Express 1.1 x16 slot PCI Express 1.1 x16 slot
x16 slots
N/A (ADD2 or MEC) N/A (ADD2 or MEC) N/A (ADD2 or MEC)
ICH7, ICH7R, ICH7DO ICH9, ICH9R, ICH9DH ICH9, ICH9R, ICH9DH
No iAMT support No iAMT support No iAMT support
August 2007 June 2007 August 2007

© 2008 Lenovo

Intel P31, P35, X38 Express Chipset Versions


This slide lists the key differences between the chipsets.

PC Architecture (TXW102) September 2008 84


Topic 4 - Bus Architecture

Processor

800/1067/1333 MHz
Intel P35 Express Chipset

System memory

Channel A
DDR2/DDR3
PCI Express
Graphics x16 Graphics
Display P35 Channel B
Card MCH DDR2/DDR3

DMI Controller
Interface Link
USB 2.0 Power management
(supports 12 ports,
dual EHCI controller) Clock generation

SATA (6 ports) System Management


(TCO)
Intel High Definition
Audio CODEC(s) Intel ICH9
SMBus 2.0/I2C
PCI Express x1
SPI Flash
GLCI
Intel Gigabit
Ethernet PHY Slots
LCI
PCI Bus
GPIO

LPC interface
Other ASICs
(optional)
Super I/O
Trusted
Platform Module Firmware
(optional) Hub

Intel P35 Express Chipset Block Diagram

PC Architecture (TXW102) September 2008 85


Topic 4 - Bus Architecture

Chipsets:
I/O Controller Hub 9 (ICH9) for 3 Series chipsets (Desktops)

• I/O controller chip with Intel 3 Series chipsets for desktops


• Changes from ICH8
- USB port disable
- Intel Turbo Memory support (ICH9R, ICH9DO) I/O Controller
- Intel Active Management Technology 3.0 support (ICH9DO) Hub 9 (ICH9)

ICH9 ICH9R ICH9DH ICH9DO


Mainstream/performance Mainstream/perf + Viiv Digital Home + Viiv Corporate stable + vPro
ICH9 Base ICH9 RAID ICH9 Digital Home ICH9 Digital Office
G33, Q33, Q35, P35, X38 G33, Q33, Q35, P35, X38 G33, P35, X38 Q35, X38
None None None IAMT 3.0 support
RAID 0, 1, 5, 10 controller RAID 0, 1, 5, 10
No integrated RAID; No integrated RAID;
for up to 6 Serial ATA controller for up to 6
4 Serial ATA ports 6 Serial ATA disks
disks Serial ATA disks
None Intel Turbo Memory None Intel Turbo Memory

None Viiv support Viiv support None


June 2007 June 2007 June 2007 August 2007

© 2008 Lenovo

I/O Controller Hub 9 (ICH9)


The I/O Controller Hub 9 (ICH9) is the ninth-generation chip that consolidates many I/O
functions needed by a system. The ICH9 family is supported on the select versions of the Intel 3
Series family (G31, Q33, Q35, P35, X38) and other compatible chipsets.

New ICH8 Features


• 12 USB 2.0 ports (two additional)
• USB port disable
• Intel Turbo Memory (ICH9R, ICH9DO) x3x chipset
• Intel Active Management Technology 3.0 (ICH9DO) Direct Media
• Alert Standard Format 2.0 (from 1.0) Interface (DMI)
• Intel Rapid Recovery Technology (ICH9R, ICH9DO)
• Command Based Port Multiplier (ICH9R, ICH9DO)

ICH8

ICH9 Block Diagram

PC Architecture (TXW102) September 2008 86


Topic 4 - Bus Architecture

The I/O Controller Hub 9 (ICH9) was announced June 2007 with four distinct chips:
• Intel 82801IB ICH9
• Intel 82801HR ICH9R
• Intel 82801IH ICH9DH
• Intel 82801IO ICH9DO

All four chips include the following base features:


• Up to six PCI Express x1 ports (OEM can configure ports 1 to 6 with one to six x1 slots [or
one x4 slot]).
• Four or six Serial ATA controllers to support six 300 MB/s Serial ATA ports. External Serial
ATA (eSATA) support.
• Six USB 2.0 controllers, each supporting two ports and with a maximum of twelve total ports.
The ICH9 contains two Enhanced Host Controller Interface (EHCI) host controllers that
support USB high-speed signaling. The ICH9 also contains six Universal Host Controller
Interface (UHCI) controllers that support USB full-speed and low-speed signaling.
• Integrated Gigabit Ethernet controller (LAN Connect Interface [LCI] and new Gigabit LAN
Connect Interface [GLCI]) MAC layer; it requires an Intel family PHY layer or compatible
chip for complete Ethernet functionality.
• PCI 2.3-compliant controller with support for 32-bit, 33 MHz PCI operations (up to six PCI
devices or slots).
• Interface to the memory controller (Intel chipset) via the Direct Media Interface (DMI) link.
• An Alert Standard Format (ASF) 2.0 systems management controller for network
manageability.
• Intel High Definition Audio (Intel HD Audio) interface with support for three external codecs.
• Intel Quiet System Technology (four TACH signals and three PWM signals)
• Simple Serial Transport (SST) Bus and Platform Environmental Control Interface (PECI).
SST provides a single-wire 1 MB/s serial bus for easier board routing and the flexibility to
place sensors where needed.
• Supports up to two Serial Peripheral Interface (SPI) devices as an alternative for the BIOS
flash device; an SPI flash device can be used as a replacement for the Firmware Hub.
• Low Pin Count (LPC) interface for support of a Super I/O chip (diskette, serial, parallel,
keyboard, mouse functions), optional ASICs, and flash BIOS.
• Systems Management Bus (SMBus) 2.0 with additional support for I2C devices.

PC Architecture (TXW102) September 2008 87


Topic 4 - Bus Architecture

Processor

Graphics MCH Memory

DMI Interface
USB 2.0 Power management
(supports 12 ports,
dual EHCI controller) Clock generation

SATA (6 ports)
System Management
(TCO)
Intel High Definition
Audio CODEC(s) Intel ICH9
SMBus 2.0/I2C
PCI Express x1
SPI Flash
GLCI
Intel Gigabit
Ethernet PHY Slots
LCI
PCI Bus
GPIO

LPC interface
Other ASICs
(optional)
Super I/O
Trusted
Platform Module Firmware
(optional) Hub

ICH9 on Lenovo ThinkCentre Desktop


(close-up view)

ICH9 on Lenovo ThinkCentre Desktop Systemboard


(in red box)

PC Architecture (TXW102) September 2008 88


Topic 4 - Bus Architecture

ICH9R
The Intel 82801IR I/O Controller Hub 9 (ICH9R) [“R” stands for RAID] supports all the features
of the ICH9 plus the following features:
• Supported with the G33, Q33, Q35, P35, and X38 chipsets
• Six Serial ATA controllers to support six 300 MB/s Serial ATA ports. External Serial ATA
(eSATA) support.
• The ICH9R provides hardware support for Advanced Host Controller Interface (AHCI) with
the Serial ATA host controller; AHCI is not supported with the non-RAID ICH9.
• The ICH9R provides support for Intel Matrix Storage Technology, providing both AHCI and
integrated RAID functionality. The industry-leading RAID capability provides high-
performance RAID 0, 1, 5, and 10 functionality on up to six SATA ports. Matrix RAID
support is provided to allow multiple RAID levels to be combined on a single set of hard
drives, such as RAID 0 and RAID 1 on two disks. Other RAID features include hot spare
support, SMART alerting, and RAID 0 auto replace. Software components include an Option
ROM for pre-boot configuration and boot functionality, a Microsoft Windows compatible
driver, and a user interface for configuration and management of the RAID capability of
ICH9R.
• Support for Intel Rapid Recovery Technology and Command Based Port Multiplier
• Support for Intel Viiv Technology

Two-Disk SATA RAID Array Supported with ICH9R

RAID 0 Volume
S0 S2 S1 S3

S4 S6 S5 S7

S8 SA S9 SB

RAID 1 Volume
S0 S1 S0 S1

S2 S3 S2 S3

S4 S5 S4 S5

Disk 0 Disk 1

PC Architecture (TXW102) September 2008 89


Topic 4 - Bus Architecture

P1-4 S1 S2 S3 S4
S5 P5-8 S6 S7 S8
S9 S10 P9-12 S11 S12
S13 S14 S15 P13-16 S16
S17 S18 S19 S20 P17-20

Physical disks for data and parity (S1 through S20 represent sectors of data from one
file. P1 through P20 represent the parity of sectors S1 through S20).

RAID-5

1 1' 1 1' 1 1'


2 2' 2 2' 2 2'
3 3' 3 3' 3 3'

1 2 3
4 5 6
7 8 9

RAID-10 Physical View (Striped RAID-1)


Striping across multiple mirrored (RAID-1) arrays

Large transfers Small transfers


Data Data
RAID Level capacity Read Write Read Write availability
single disk n good good good good fair
RAID 0 n very good very good very good very good poor
RAID 1 n/2 very good good very good very good very good
RAID 5 n-1 good fair good fair very good

Note: In data capacity, n refers to the number of equally sized disks in the array.

RAID Also known as Fault tolerance Redundancy type Hot spare option Disks required
0 Striping No None No 1 or more
1 Mirroring Yes Duplicate Yes 2
Striping with distributed
5 Yes Parity Yes 3 or more
parity
Striping across multiple
10 Yes Duplicate Yes 4 or more
RAID-1 arrays

PC Architecture (TXW102) September 2008 90


Topic 4 - Bus Architecture

ICH9DH
The Intel 82801IH I/O Controller Hub 8 (ICH8DH) [“DH” stands for Digital Home] supports all
the features of the ICH9 and ICH9R plus the following features:
• Supported with the consumer-oriented G33, P35, and X38 chipsets
• Intel Quick Resume Technology
• Supports Intel Matrix Storage Technology (including Advanced Host Controller Interface)
• Support for Intel Viiv Technology

ICH9DO
The Intel 82801IO I/O Controller Hub 9 (ICH9DO) [“DO” stands for Digital Office] supports all
the features of the ICH9 and ICH9R plus the following features:
• Supported with the corporate-oriented Q35 and X38 chipset
• Support for the Intel Active Management Technology (IAMT) 3.0 for client manageability
when used with the supported Intel Ethernet controller.
• Supports Intel Matrix Storage Technology (including Advanced Host Controller Interface)
with RAID 0/1/5/10
• Support for Intel Rapid Recovery Technology and Command Based Port Multiplier
• Support for Intel vPro Technology

Number of Intel Matrix Storage


Product Name Short Name
SATA Ports Technology
ICH9 Base ICH9 4 No

ICH9 RAID ICH9R 6 Yes

ICH9 Digital Home ICH9DH 6 Yes

ICH9 Digital Office ICH9DO 6 Yes

PC Architecture (TXW102) September 2008 91


Topic 4 - Bus Architecture

Chipsets:
Intel 4 Series Express Chipset Family (Desktop)

• G41, G43, G45, Q43, Q45, P43, P45, X48


• Core 2 Duo and Core 2 Quad support for desktops
• (Graphics) Memory Controller Hub with select features
- Up to 1333 MHz system bus
- DDR2 or DDR3 memory
- GMA 4500, GMA X4500, or GMA X4500HD
- PCI Express 2.0 x16 adapter support Processor
• I/O Controller Hub 10 (ICH10) PCI Express 2.0
Memory controller
x16 slot(s)
• June 2008 PCI Express slots
(G)MCH Memory

Direct Memory
Media
Interface PCIe controller
I/O
PCIe Controller PCI controller
Hub SATA controller
(ICH) USB controller
6 SATA disks
Super I/O Serial USB 2.0
Peripheral High Definition Audio
Low Pin Interface
G45 Express Chipset
Count interface
© 2008 Lenovo

Intel 4 Series Express Chipset


In June and August 2008, Intel announced the Intel G41, G43, G45, Q43, Q45, P43, P45, and X48
Express Chipset (code-named Eaglelake) for desktop systems. These chipsets were designed for
processors using the LGA775 socket such as the Core 2 Duo, Core 2 Quad, and Core 2 Extreme
processors. These chipsets work with the various I/O Controller Hub family chips (mostly the
ICH10 family)
Intel is using the following naming convention:
Q = stability, manageability, graphics, optimized for business
G = integrated graphics, video/3D, optimized for consumer
P = discrete graphics for performance support
X = extreme gaming

Intel P43 Express Chipset with I/O Controller Hub 10

PC Architecture (TXW102) September 2008 92


Topic 4 - Bus Architecture

All the chipsets support Intel Fast Memory Access which is a backbone architecture that improves
system performance by optimizing the use of available memory bandwidth and reducing the
latency of the memory accesses. Intel Fast Memory Access consists of these features:
– Just in Time Command Scheduling – Maximizes bandwidth by monitoring all pending
accesses to memory, allowing for the safe and efficient overlapping of commands on the
system memory bus.
– Out of Order Scheduling – This isolation functionality stops infected systems from affecting
others on the network by isolating clients and blocking outbound communication.
– Opportunistic Writes – Monitors system memory requests and issues pending write requests to
memory during idle times, allowing for a more efficient flow of data.
– Clock Crossing Optimizations – Ensures that data is transferred in a highly optimized manner,
enabling data transfer on the first usable clock phase encountered between the two frequency
domains.
All the chipsets support Intel Flex Memory Technology which facilitates easier upgrades by
allowing different memory sizes to be populated and remain in dual-channel mode.

PC Architecture (TXW102) September 2008 93


Topic 4 - Bus Architecture

Chipsets:
Intel G41, G43, G45 Express Chipset Versions (Desktop)
G41 G43 G45
Essential PC Mainstream PC Mainstream/performance PC
Core 2 Duo/Quad Core 2 Duo/Quad Core 2 Duo/Quad
DDR2 or DDR3 DDR2 or DDR3 DDR2 or DDR3
800, 1066, 1333 MHz bus 800, 1066, 1333 MHz bus 800, 1066, 1333 MHz bus
1 DIMM/2 channels 1 DIMM/2 channels 2 DIMM/2 channels
8 GB max memory (DDR2) 8 GB max memory (DDR2) 16 GB max memory (DDR2)
4 GB max memory (DDR3) 4 GB max memory (DDR3) 8 GB max memory (DDR3)
GMA X4500 GMA X4500 GMA X4500HD
DirectX 10-compatible DirectX 10-compatible DirectX 10-compatible
Clear Video Technology Clear Video Technology Clear Video Technology
PCI Express 1.1 x16 slot PCI Express 2.0 x16 slot PCI Express 2.0 x16 slot
Dual Independent Display Dual Independent Display Dual Independent Display
with ADD2 or MEC with ADD2 or MEC with ADD2 or MEC
VGA, sDVO, DVI, VGA, sDVO, DVI, VGA, sDVO, DVI,
DisplayPort DisplayPort, HDMI DisplayPort, HDMI
Full hardware decode of
No No
MPEG2, VC1, AVC
ICH7 ICH10, ICH10R ICH10, ICH10R
No iAMT support No iAMT support No iAMT support
August 2008 June 2008 June 2008

© 2008 Lenovo

Intel G41, G43, G45 Express Chipset Versions


This slide lists the key differences between the chipsets.
ƒ G43 – Smooth Blu-ray playback for high-definition viewing
ƒ G45 – Ultimate Blu-ray playback experience with full bit-rate support

Intel G43 Express Chipset with ICH10

PC Architecture (TXW102) September 2008 94


Topic 4 - Bus Architecture

Processor

800/1066/1333 MHz

16
ADD2 8 System memory
or MEC
4 Channel A
DDR2/DDR3
PCI Express 4
x16 Graphics GMCH Channel B
DDR2/DDR3

DP/HDMI/DVI

DP/HDMI/DVI
DMI Controller
HDA
Interface Link
USB 2.0 Power management
(12 ports)
SMBus 2.0/I2C
GPIO
Intel High Definition
Audio Codec(s)
6 Serial ATA Ports
Intel ICH10 LCI
SPI Intel 82567 Gigabit Platform
SPI Flash BIOS GLCI LAN Connect

SST and PECI Slots


PCI Express 5 with LAN or
Sensor Input Bus 6 PCIe Slots
Slots
Fan Speed Control Four PCI
Output PCI Bus Masters

LPC interface
Trusted
Platform Module Super I/O

Intel G43, G45 Express Chipset Block Diagram

PC Architecture (TXW102) September 2008 95


Topic 4 - Bus Architecture

Chipsets:
Intel Q43, Q45 Express Chipset Versions (Desktop)
Q43 Q45
Corporate Stable Corporate Stable
(Fundamental) (Professional)
Core 2 Duo/Quad Core 2 Duo/Quad
DDR2 or DDR3 DDR2 or DDR3
800, 1066, 1333 MHz bus 800, 1066, 1333 MHz bus
2 DIMM/2 channels 2 DIMM/2 channels
16 GB max memory (DDR2) 16 GB max memory (DDR2)
8 GB max memory (DDR3) 8 GB max memory (DDR3)
GMA 4500 GMA 4500
DirectX 10-compatible DirectX 10-compatible
No Clear Video Technology No Clear Video Technology
PCI Express 2.0 x16 slot PCI Express 2.0 x16 slot
Dual Independent Display Dual Independent Display
with ADD2 or MEC with ADD2 or MEC
VGA, sDVO, DVI, VGA, sDVO, DVI,
DisplayPort DisplayPort
Full hardware decode of Full hardware decode of
MPEG2, VC1, AVC MPEG2, VC1, AVC
ICH10D ICH10DO
iAMT 3.5 support iAMT 5 support
August 2008 August 2008

© 2008 Lenovo

Intel Q43 and Q45 Express Chipset Versions


This slide lists the key differences between the chipsets.

Intel Q45 Express Chipset is used in select Lenovo ThinkCentre desktops

PC Architecture (TXW102) September 2008 96


Topic 4 - Bus Architecture

Chipsets:
Intel P43, P45, X48 Express Chipset Versions (Desktop)

P43 P45 X48


Mainstream PC Performance PC Gamer PC
Core 2 Duo/Quad Core 2 Duo/Quad Core 2 Duo/Quad/Extreme
DDR2 or DDR3 DDR2 or DDR3 DDR2 or DDR3
800, 1066, 1333 MHz bus 800, 1066, 1333 MHz bus 800, 1066, 1333, 1600 MHz
2 DIMM/2 channels 2 DIMM/2 channels
2 DIMM/2 channels
16 GB max memory (DDR2) 16 GB max memory (DDR2)
8 GB max memory
8 GB max memory (DDR3) 8 GB max memory (DDR3)
No GMA No GMA No GMA
1 PCI Express 2.0 x16 slot or
PCI Express 2.0 x16 slot 2 PCI Express 2.0 x16 slots
2 PCI Express 2.0 x8 slots
Dual Independent Display Dual Independent Display Dual Independent Display
with ADD2 or MEC with ADD2 or MEC with ADD2 or MEC
No monitor ports No monitor ports No monitor ports
ICH10, ICH10R ICH10, ICH10R ICH9, ICH9R, ICH9DH, ICH9DO
No iAMT support No iAMT support No iAMT support
June 2008 June 2008 June 2008

© 2008 Lenovo

Intel P43, P45, X48 Express Chipset Versions


This slide lists the key differences between the chipsets.
• P43 – Mainstream flexibility with one PCI Express 2.0 x16 slot
• P45 – Tunable performance with one PCI Express 2.0 x16 slot or two PCI Express 2.0 x8
slots

Intel P45 Express Chipset with ICH10

PC Architecture (TXW102) September 2008 97


Topic 4 - Bus Architecture

Processor

800/1066/1333 MHz
Intel P35 Express Chipset

System memory

Channel A
DDR2/DDR3
PCI Express
x16 Graphics MCH Channel B
DDR2/DDR3

DMI Controller
HDA
Interface Link
USB 2.0 Power management
(12 ports)
SMBus 2.0/I2C
GPIO
Intel High Definition
Audio Codec(s)
6 Serial ATA Ports
Intel ICH10 LCI
SPI Intel 82567 Gigabit Platform
SPI Flash BIOS GLCI LAN Connect

SST and PECI Slots


PCI Express 5 with LAN or
Sensor Input Bus 6 PCIe Slots
Slots
Fan Speed Control Four PCI
Output PCI Bus Masters

LPC interface
Trusted
Platform Module Super I/O

Intel P43, P45 Express Chipset Block Diagram

PC Architecture (TXW102) September 2008 98


Topic 4 - Bus Architecture

Chipsets:
I/O Controller Hub 10 (ICH10) for 4 Series chipset (desktop)

• I/O controller chip with Intel 4 Series desktop chipsets


• Changes from ICH9
- (Some) Integrated security chip (TPM)
- (Some) AHCI (for hot-swap SATA disks) I/O Controller
- (Some) Intel Active Management Technology 3.5 and 5.0 Hub 10 (ICH10)

ICH10 ICH10R ICH10D ICH10DO


Mainstream/performance Mainstream/performance Corporate stable Corporate stable + vPro
ICH10 Base ICH10 RAID ICH10 Digital ICH10 Digital Office
G4x, P4x, Q4x chipset G4x, P4x, Q4x chipset G4x, P4x, Q4x chipset Q4x chipset
None None IAMT 3.5 support IAMT 5.0 support
No RAID; AHCI RAID 0, 1, 5, 10; AHCI No RAID; AHCI RAID 0, 1, 5, 10; AHCI
None None Integrated TPM Integrated TPM
None Intel Turbo Memory None None
Viiv support Viiv support None vPro support
June 2008 June 2008 August 2008 August 2008

© 2008 Lenovo

I/O Controller Hub 10 (ICH10)


The I/O Controller Hub 10 (ICH10) is the tenth-generation chip that consolidates many I/O
functions needed by a system. The ICH10 family is supported on the select versions of the Intel
4 Series family (G41, G43, G45, Q43, Q45, P43, P45) and other compatible chipsets.

x4x chipset
Direct Media
Interface (DMI)

ICH10

ICH10 Block Diagram

PC Architecture (TXW102) September 2008 99


Topic 4 - Bus Architecture

The I/O Controller Hub 10 (ICH10) was announced June and August 2008 with four distinct chips:
• Intel 82801JIB ICH10 Base
• Intel 82801JIR ICH10R Raid
• Intel 82801JID ICH10D Digital
• Intel 82801JIO ICH10DO Digital Office
The Intel 82801JIB I/O Controller Hub 10 (ICH10) supports the following features:
• Supported with G41, G43, G45, P43, and P45 chipset.
• Up to six PCI Express x1 ports (OEM can configure ports 1 to 6 with one to six x1 slots [or
one x4 slot]).
• Six Serial ATA controllers to support six Serial ATA 3.0 Gb/s ports. External Serial ATA
(eSATA) support. It supports an integrated Advanced Host Controller Interface (AHCI)
controller.
• Six USB 2.0 controllers, each supporting two ports and with a maximum of twelve total ports.
The ICH10 contains two Enhanced Host Controller Interface (EHCI) host controllers that
support USB high-speed signaling. The ICH10 also contains six Universal Host Controller
Interface (UHCI) controllers that support USB full-speed and low-speed signaling.
• Integrated Gigabit Ethernet controller (LAN Connect Interface [LCI] and Gigabit LAN
Connect Interface [GLCI]) MAC layer; it requires an Intel family PHY layer or compatible
chip for complete Ethernet functionality.
• PCI 2.3-compliant controller with support for 32-bit, 33 MHz PCI operations (up to six PCI
devices or slots).
• Interface to the memory controller (Intel chipset) via the Direct Media Interface (DMI) link.
• An Alert Standard Format (ASF) 2.0 systems management controller for network
manageability.
• Intel High Definition Audio (Intel HD Audio) interface with support for three external codecs.
• Intel Quiet System Technology (four TACH signals and three PWM signals)
• Simple Serial Transport (SST) Bus and Platform Environmental Control Interface (PECI).
SST provides a single-wire 1 MB/s serial bus for easier board routing and the flexibility to
place sensors where needed.
• Supports up to two Serial Peripheral Interface (SPI) devices as an alternative for the BIOS
flash device; an SPI flash device can be used as a replacement for the Firmware Hub.
• Low Pin Count (LPC) interface for support of a Super I/O chip (diskette, serial, parallel,
keyboard, mouse functions), optional ASICs, and flash BIOS.
• Systems Management Bus (SMBus) 2.0 with additional support for I2C devices.

PC Architecture (TXW102) September 2008 100


Topic 4 - Bus Architecture

Processor

Graphics MCH Memory

DMI Interface
USB 2.0 Power management
(supports 12 ports,
dual EHCI controller) Clock generation

SATA (6 ports)
System Management
(TCO)
Intel High Definition
Audio CODEC(s) Intel ICH10
SMBus 2.0/I2C
PCI Express x1
SPI Flash
GLCI
Intel Gigabit
Ethernet PHY Slots
LCI
PCI Bus
GPIO

LPC interface
Other ASICs
(optional)
Super I/O
Trusted
Platform Module Firmware
(optional) Hub

ICH10 on Lenovo ThinkCentre Desktop


(close-up view)

ICH10 on Lenovo ThinkCentre Desktop Systemboard


(in red box)

PC Architecture (TXW102) September 2008 101


Topic 4 - Bus Architecture

ICH10R
The Intel 82801JIR I/O Controller Hub 10 (ICH9R) [“R” stands for RAID] supports all the features
of the ICH10 Base plus the following features:
• Supported with G41, G43, G45, P43, and P45 chipset.
• The ICH10R provides support for Intel Matrix Storage Technology, providing both AHCI and
integrated RAID functionality. The industry-leading RAID capability provides high-
performance RAID 0, 1, 5, and 10 functionality on up to six SATA ports. Matrix RAID
support is provided to allow multiple RAID levels to be combined on a single set of hard
drives, such as RAID 0 and RAID 1 on two disks. Other RAID features include hot spare
support, SMART alerting, and RAID 0 auto replace. Software components include an Option
ROM for pre-boot configuration and boot functionality, a Microsoft Windows compatible
driver, and a user interface for configuration and management of the RAID capability of
ICH10R.

Two-Disk SATA RAID Array Supported with ICH10R and ICH10DO

RAID 0 Volume
S0 S2 S1 S3

S4 S6 S5 S7

S8 SA S9 SB

RAID 1 Volume
S0 S1 S0 S1

S2 S3 S2 S3

S4 S5 S4 S5

Disk 0 Disk 1

PC Architecture (TXW102) September 2008 102


Topic 4 - Bus Architecture

P1-4 S1 S2 S3 S4
S5 P5-8 S6 S7 S8
S9 S10 P9-12 S11 S12
S13 S14 S15 P13-16 S16
S17 S18 S19 S20 P17-20

Physical disks for data and parity (S1 through S20 represent sectors of data from one
file. P1 through P20 represent the parity of sectors S1 through S20).

RAID-5

1 1' 1 1' 1 1'


2 2' 2 2' 2 2'
3 3' 3 3' 3 3'

1 2 3
4 5 6
7 8 9

RAID-10 Physical View (Striped RAID-1)


Striping across multiple mirrored (RAID-1) arrays

Large transfers Small transfers


Data Data
RAID Level capacity Read Write Read Write availability
single disk n good good good good fair
RAID 0 n very good very good very good very good poor
RAID 1 n/2 very good good very good very good very good
RAID 5 n-1 good fair good fair very good

Note: In data capacity, n refers to the number of equally sized disks in the array.

RAID Also known as Fault tolerance Redundancy type Hot spare option Disks required
0 Striping No None No 1 or more
1 Mirroring Yes Duplicate Yes 2
Striping with distributed
5 Yes Parity Yes 3 or more
parity
Striping across multiple
10 Yes Duplicate Yes 4 or more
RAID-1 arrays

PC Architecture (TXW102) September 2008 103


Topic 4 - Bus Architecture

ICH10D
The Intel 82801JID I/O Controller Hub 10 (ICH8D) [“D” stands for Digital] supports all the
features of the ICH10 plus the following features:
• Supported with corporate-oriented Q43 chipset.
• Integrated Trusted Platform Module (TPM) security chip.
• Support for DASH 1.0 and Intel Active Management Technology (IAMT) 3.5 for client
manageability when used with the supported Intel Ethernet controller.

ICH10DO
The Intel 82801JIO I/O Controller Hub 10 (ICH10DO) [“DO” stands for Digital Office] supports
all the features of the ICH10 plus the following features:
• Supported with corporate-oriented Q45 chipset.
• Integrated Trusted Platform Module (TPM) security chip.
• Support for DASH 1.0 and Intel Active Management Technology (IAMT) 5.0 for client
manageability when used with the supported Intel Ethernet controller.
• Support for Intel vPro Technology.

PC Architecture (TXW102) September 2008 104


Topic 4 - Bus Architecture

Chipsets:
Mobile Intel 965 Express Chipset Family (Notebook)
• GM965 (integrated graphics), PM965 (discrete graphics)
• Intel Core 2 Duo processor support for notebooks
• (Graphics) Memory Controller Hub
- Dual-channel DDR2 533 or 667 MHz memory
- Integrated graphics (Graphics Media Accelerator X3100)
or discrete graphics (via PCI Express x16)
• I/O Controller Hub 8M (ICH8M)
Processor
- PCI Express x1 for ethernet, PCI Express Memory and graphics
ExpressCard, Intel Turbo Memory x16 graphics controller
- Serial SATA support MCH or
ExpressCard GMCH Memory
• Announced May 2007 host bridge
Memory
Intel Turbo
Memory Ethernet
PCIe controller
I/O PCI controller
PCI Express x1 Controller SATA controller
The Mobile Intel Hub IDE controller
GM965 Express 3 SATA ports (ICH8M)
USB controller
Chipset is used in
select Lenovo Super I/O USB 2.0
notebooks such as High Definition
the ThinkPad R61. Low Pin Audio
Count interface
© 2008 Lenovo

Mobile Intel 965 Express Chipset Family (Notebook)


In May 2007, Intel announced the Mobile Intel 965 Express Chipset family (code-named Crestline)
for notebook systems. The specific chipsets included the GM965 and PM965.
For the GM965, the G stands for graphics and the M stands for mobile. The GM965 has integrated
graphics and is positioned for mainstream notebooks.
For the PM965, the P stands for discrete graphics with high performance and the M stands for
mobile. The PM965 has discrete graphics and is positioned for high performance notebooks.
These chipsets work with the I/O Controller Hub 8M (ICH8M or ICH8M-Enhanced).

PC Architecture (TXW102) September 2008 105


Topic 4 - Bus Architecture

Mobile Intel GM965 Express Chipset


The Mobile Intel GM965 Express Chipset consists of the GM965 Graphics Memory Controller
Hub (GMCH) and a I/O Controller Hub 8M (ICH8M).
Introduced in May 2007, the GM965 supports the following features:
• Supports the Intel Core 2 Duo processor.
• Supports a system bus of 533 or 800 MHz.
• Supports DDR2 memory at 533 or 667 MHz.
• Supports either a single-channel (64-bit wide) or dual-channel (128-bit wide) memory channel
with one SO-DIMM per channel. Supports up to two SO-DIMMs. Dual-channel support requires
matched pairs of memory DIMMs with identical memory chips. Dual-channel operation should
result in slight performance improvements.
• Supports only non-ECC, non-parity, unbuffered memory with up to 4 GB maximum
addressability (4 GB addressability requires a 64-bit operating system).
• Supports the Direct Media Interface (DMI) private interconnect between GMCH and I/O
Controller Hub 8M (ICH8M) with a 100 MHz reference clock and 2 GB/s bandwidth.
• The GMCH has an integrated graphics controller called Graphics Media Accelerator X3100
(which greatly reduces system cost because a graphics adapter is not needed) with a 266 MHz,
320 MHz, 400 MHz, or 500 MHz core render clock at 1.05 volts. The GMA X3100 has support
for an analog display port, TV out, LVDS interface (for a notebook screen), analog monitor, and
two Serial Digital Video Out ports.
• The Graphics Media Accelerator X3100 (GMA X3100) uses the main system memory as
graphics memory (called Dynamic Video Memory Technology (DVMT) 4.0). It uses up to 64
MB to 384 MB of main system memory (see table in the Graphics topic on the X3100 slide).
This memory usage is set by the driver dynamically, and memory usage will vary.
• The GMA X3100 supports Intel Clear Video Technology.
The I/O Controller Hub 8M (ICH8M) features are covered in an earlier page under I/O Controller
Hub in this topic.

Mobile Intel GM965 Express chipset

PC Architecture (TXW102) September 2008 106


Topic 4 - Bus Architecture

Intel Core 2 Duo Processor

533/800 MHz Intel GM965 System memory


Express Chipset

CRT Channel A
DDR2
533/667 MHz
LVDS interface Intel
GM965
GMCH
TV out Channel B
DDR2
533/667 MHz
SDVO (2 ports)
DMI
Interface
USB 2.0 Power management
10 ports, 480 Mb/s
Clock generation
GPIO

3 Serial ATA ports System management


300 MB/s
Intel ICH8M SMBus 2.0
2 ATA-100 ports

High Definition LAN Connect Interface


Audio
PCI Bus Docking
6 PCI Express x1 station

CardBus
ExpressCard controller
controller

Ethernet
SPI BIOS
Intel Turbo Memory LPC interface
Other ASICs
(optional)
Super I/O
TPM
(optional) Flash BIOS

Mobile Intel GM965 Express Chipset Block Diagram

PC Architecture (TXW102) September 2008 107


Topic 4 - Bus Architecture

Mobile Intel PM965 Express Chipset


The PM965 Express Chipset consists of the PM965 Memory Controller Hub (MCH) and a I/O
Controller Hub 8M (ICH8M).
Introduced in May 2007, the PM965 Memory Controller Hub (MCH) is the same as the GM965
chipset (see the GM965 chipset details on a previous page) except for the following:
• Does not have an integrated graphics controller (the GMA X3100); instead it requires a high
performance discrete (external) graphics using a PCI Express x16 link.

Intel Core 2 Duo Processor

533/800 MHz Intel PM965 System memory


Express Chipset

Channel A
DDR2
533/667 MHz
PCI Express x16 Intel
PCI Express PM965
x16 Graphics MCH
Channel B
DDR2
533/667 MHz

DMI
Interface
USB 2.0 Power management
10 ports, 480 Mb/s
Clock generation
GPIO

3 Serial ATA ports System management


150 MB/s
Intel ICH8M SMBus 2.0
2 ATA-100 ports

High Definition LAN Connect Interface


Audio
PCI Bus Docking
6 PCI Express x1 station

CardBus
ExpressCard controller
controller

Ethernet
SPI BIOS

Intel Turbo Memory LPC interface


Other ASICs
(optional)
Super I/O

TPM
(optional) Flash BIOS

Mobile Intel PM965 Express Chipset Block Diagram

PC Architecture (TXW102) September 2008 108


Topic 4 - Bus Architecture

Mobile Intel GL960 Express Chipset


The GL960 Express Chipset consists of the GL960 Graphics Memory Controller Hub (GMCH) and
the I/O Controller Hub 8M (ICH8M).
Introduced in June 2006, the GL960 Graphics Memory Controller Hub (GMCH) is the same as the
GM965 chipset except for the following:
• The GL960 supports the Merom-based Intel Celeron processors using Socket P.
• Supports a system bus of 533 MHz.
• Supports DDR2 memory at 533 MHz.
• Supports up to 2 GB maximum addressability.
• The GMCH has an integrated graphics controller called Graphics Media Accelerator X3100
(which greatly reduces system cost because a graphics adapter is not needed) with only a 400
MHz core render clock at 1.05 volts.
• Does not support the ICH8M-Enhanced which supports Intel Active Management Technology
(supports the ICH8M only).

Mobile Intel GS965 Express Chipset


The GS965 Express Chipset consists of the GS965 Graphics Memory Controller Hub (GMCH) and
the I/O Controller Hub 8-Enhanced-S (ICH8M-Enhanced-S).
Introduced in early 2008, the GS965 has the same features as the GM965 except it is a smaller
physical size (smaller package size, package height, and ball pitch) than the GM965.

PC Architecture (TXW102) September 2008 109


Topic 4 - Bus Architecture

Chipsets:
Mobile Intel 965 Express Chipset Versions (Notebook)

GL960 GM965 PM965


Value notebooks Mainstream notebooks High performance notebooks
Celeron Core 2 Duo Core 2 Duo
533 MHz system bus 533, 800 MHz system bus 533, 800 MHz system bus
Dual-channel support Dual-channel support Dual-channel support
DDR2-533 DDR2-533, DDR2-667 DDR2-533, DDR2-667
Up to 2GB memory Up to 4GB memory Up to 4GB memory
Integrated graphics Integrated graphics Discrete graphics
at 400 MHz (GMA X3100) up to 500 MHz (GMA X3100) via PCI Express x16
ICH8M ICH8M, ICH8M-Enhanced ICH8M, ICH8M-Enhanced
June 2007 May 2007 May 2007

• The GS965 is the same as the GM965


except smaller package

Mobile Intel GM965


Express Chipset
© 2008 Lenovo

Mobile Intel 965 Express Chipset Versions


These chipsets support Socket P-based processors.
The key differences among the chipsets are the following:
• The GL960, GM965, and GS965 support an integrated graphics controller (GMA X3100).
• The PM965 uses discrete graphics via a PCI Express x16 graphics controller which
positions it for high performance notebooks.

The Mobile Intel PM965 Express Chipset is used


in select ThinkPad T61 notebooks.

PC Architecture (TXW102) September 2008 110


Topic 4 - Bus Architecture

Chipsets:
I/O Controller Hub 8M (ICH8M) for 965 chipset (notebook)

• Chip with Mobile Intel 965 chipsets


• ICH8M-Enhanced supports Intel Active Management Technology
• ICH8M-S Enhanced is a smaller package than the ICH8M-Enhanced

ICH8M ICH8M-Enhanced
No Intel AMT support Intel AMT support
GL960, GM965, PM965 GM965, PM965
Six PCI Express x1 ports Six PCI Express x1 ports
Three Serial ATA ports (SATA300) Three Serial ATA ports (SATA300)
One ATA-100 controller One ATA-100 controller
(two ports) (two ports)
Ten USB 2.0 ports Ten USB 2.0 ports
Gigabit Ethernet controller Gigabit Ethernet controller
(MAC layer) (MAC layer)
No integrated RAID Integrated RAID 0 or 1
May 2007 May 2007
I/O Controller Hub 8M
(ICH8M)

© 2008 Lenovo

I/O Controller Hub 8M (ICH8M)


The I/O Controller Hub 8M (ICH8M) is a chip that consolidates many I/O functions needed by
a system. This chip is positioned for notebook systems as the M stands for mobile. The ICH8M
and ICH8M-Enhanced are supported on the Mobile Intel 965 Express chipsets such as the
GM965 and PM965 chipsets.

96x chipset
New ICH8M Features
• Intel Active Management Technology 2.5 Management
• Intel Management Engine Engine

• Gigabit ethernet (MAC layer) Direct Media Controller


• One additional SATA300 port Interface (DMI) Link
• Two additional PCI Express x1 ports Management
• Two additional USB ports added Engine

Legacy Removal ICH8M


• AC’97 codec (only HD Audio)
removed
• 10/100 ethernet (now Gigabit)

ICH8M Block Diagram

PC Architecture (TXW102) September 2008 111


Topic 4 - Bus Architecture

The Intel I/O Controller Hub 8M (ICH8M) and ICH8M-Enhanced were announced May 2007.
Features of the chip include the following:
• ICH8M-Enhanced supports Intel Active Management Technology 2.5. The Intel Management
Engine via the controller link between the Memory Hub and ICH provides communication for
this function.
• ICH8M-Enhanced supports RAID 0 and 1.
• Six PCI Express x1 ports (OEM can configure between one and six x1 slots including one x4
slot). ExpressCard slots are supported through this interface.
• Three Serial ATA controllers to support two 300 MB/s Serial ATA (SATA300) ports.
• One IDE controller supporting ATA-100/66/33 and PIO modes; the one controller supports
two devices (ports).
• Five USB 2.0 controllers, each supporting two ports and with a maximum of ten total ports.
• Integrated Gigabit Ethernet controller (LAN Connect Interface (LCI)) MAC layer; it requires
an Intel family PHY layer or compatible chip for complete Ethernet functionality.
• PCI 2.3-compliant controller with support for 32-bit, 33 MHz PCI operations.
• Interface to the memory controller (Intel 96x chipset) via the Direct Media Interface (DMI)
link.
• Intel High Definition Audio (Intel HD Audio) interface with support for three external codecs.
• Low Pin Count (LPC) interface for support of a Super I/O chip (security chip, serial, parallel,
keyboard, mouse functions), optional ASICs, and flash BIOS.
• ACPI 2.0 power management logic support.
• Systems Management Bus (SMBus) 2.0 with additional support for I2C devices.
• Enhanced DMA controller, interrupt controller, and timer functions.
• Serial Peripheral Interface (SPI) as an alternative for the BIOS flash device; an SPI flash
device can be used as a replacement for the Firmware Hub.

The ICH8M-S Enhanced is the same as the ICH8M Enhanced except it is a smaller physical size
(smaller package size, package height, and ball pitch).

PC Architecture (TXW102) September 2008 112


Topic 4 - Bus Architecture

Processor

(G)MCH
Memory
Intel
Management
Engine

DMI Controller
Interface Link
USB 2.0 Intel
Management Power management
10 ports, 480 Mb/s Engine

Clock generation
GPIO

3 Serial ATA ports System management


300 MB/s
Intel ICH8M SMBus 2.0
2 ATA-100 ports

High Definition LAN Connect Interface


Audio
PCI Bus Docking
6 PCI Express x1 station

CardBus
ExpressCard controller
controller

Ethernet
SPI BIOS

Intel Turbo Memory LPC interface


Other ASICs
(optional)
Super I/O
TPM
(optional) Flash BIOS

ICH7-M Block Diagram

Intel ICH8M on ThinkPad systemboard


(red box)

PC Architecture (TXW102) September 2008 113


Topic 4 - Bus Architecture

Chipsets:
Mobile Intel 4 Series Express Chipset Family (Notebook)
• GL40, GS45, GM45, PM45
• Intel processor support for notebooks
• Integrated graphics (Graphics Media Accelerator 4500M or 4500MHD)
or discrete graphics (via PCI Express x16)
or switchable graphics
• (Graphics) Memory Controller Hub
- Dual-channel DDR2 or DDR3 memory
Processor
• I/O Controller Hub 9M (ICH9M) PCI Express Memory and graphics
- ICH9M x16 graphics controller
MCH or
- ICH9M-Enhanced ExpressCard GMCH Memory
host bridge
- ICH9M-SFF-Enhanced Memory
Intel Turbo
• Announced July 2008 Memory Ethernet
PCIe controller
I/O PCI controller
PCI Express x1 Controller SATA controller
The Mobile Intel GM45 Hub IDE controller
Express Chipset is 4 SATA ports (ICH9M)
USB controller
used in select Lenovo
notebooks such as Super I/O USB 2.0
the ThinkPad R500 High Definition
Low Pin Audio
Count interface
© 2008 Lenovo

Mobile Intel 4 Series Express Chipset Family (Notebook)


In July 2008, Intel announced the Mobile Intel 4 Series Express Chipset family (code-named
Cantiga) for notebook systems. The specific chipsets included the GL40, GS45, GM45, and PM45.
For the GL40, the G stands for graphics and the L stands for light. The GL40 has integrated
graphics and is positioned for value notebooks.
For the GS45, the G stands for graphics and the S stands for small form factor. The GS45 has
integrated graphics and is positioned for small form factor notebooks.
For the GM45, the G stands for graphics and the M stands for mobile. The GM45 has integrated or
switchable graphics and is positioned for mainstream notebooks.
For the PM45, the P stands for discrete graphics with high performance and the M stands for
mobile. The PM45 has discrete graphics and is positioned for high performance notebooks.
These chipsets work with the I/O Controller Hub 9M (ICH9M, ICH9M-Enhanced, or ICH9M-SFF-
Enhanced).

PC Architecture (TXW102) September 2008 114


Topic 4 - Bus Architecture

Mobile Intel GM45 Express Chipset


The Mobile Intel GM45 Express Chipset consists of the GM45 Graphics Memory Controller Hub
(GMCH) and an I/O Controller Hub 9M (ICH9M).
The GM45 supports the following features:
• Supports the latest Intel notebook processors such as the Intel Celeron, Core 2 Duo, and Core 2
Extreme processor.
• Supports a system bus at 667, 800, or 1066 MHz.
• Supports DDR2 memory at 667 or 800 MHz.
• Supports DDR3 memory at 667, 800, and 1066 MHz.
• Supports either a single-channel (64-bit wide) or dual-channel (128-bit wide) memory channel
with one SO-DIMM per channel. Supports up to two SO-DIMMs. Dual-channel support requires
matched pairs of memory DIMMs with identical memory chips. Dual-channel operation should
result in slight performance improvements.
• Supports only non-ECC, non-parity, unbuffered memory with up to 8 GB maximum
addressability (8 GB addressability requires a 64-bit operating system).
• Supports the Direct Media Interface (DMI) private interconnect between GMCH and I/O
Controller Hub 9M (ICH9M) with a 100 MHz reference clock and 2 GB/s bandwidth.
• The GMCH has an integrated graphics controller called Graphics Media Accelerator 4500MHD
(which greatly reduces system cost because a discrete graphics adapter is not needed) with a 533
MHz core render clock at 1.05 volts. The GMA 4500MHD has support for an analog display
port, TV out, dual-channel LVDS interface (for a notebook screen), two Serial Digital Video Out
ports, HDMI, and three DisplayPorts. It uses the main system memory as graphics memory
(called Dynamic Video Memory Technology (DVMT)).
• Supports a PCI Express x16 port for discrete graphics; used with switchable graphics
configurations.
• Supports Intel Active Management Technology (IAMT) 4.0 and Intel Anti-Theft Technology
(AT-p)
The I/O Controller Hub 9M (ICH9M) features are covered in an earlier page under I/O Controller
Hub in this topic.

Mobile Intel GM45 Express chipset

PC Architecture (TXW102) September 2008 115


Topic 4 - Bus Architecture

Intel Mobile Processor

667/800/1066 MHz Intel GM45 System memory


Express Chipset

CRT Channel A
DDR2 or DDR3
667/800/1066 MHz
LVDS interface
Intel GM45
GMCH
TV out
Channel B
PCI Express x16 DDR2 or DDR3
2 SDVO ports 667/800/1066 MHz

DMI
HDMI/DVI
Interface

DisplayPort PCI Express x1 ExpressCard controller

PCI Express x1 Intel Turbo Memory


Discrete graphics
PCI Express x1

PCI Express x1
USB 2.0 Intel ICH9M
12 ports, 480 Mb/s PCI Express x1 WLAN/WiMax

4 Serial ATA ports PCI Express x1


3.0 Gb/s
LAN
Intel High Definition GLCI
Audio
10/100 LCI
Controller Link 1
LPC interface PCI Bus

Other ASICs
(optional)
Super I/O

TPM

Mobile Intel GM45 Express Chipset Block Diagram

PC Architecture (TXW102) September 2008 116


Topic 4 - Bus Architecture

Mobile Intel PM45 Express Chipset


The PM45 Express Chipset consists of the PM45 Memory Controller Hub (MCH) and an I/O
Controller Hub 9M (ICH9M).
The PM45 Memory Controller Hub (MCH) is the same as the GM45 chipset (see the GM45 chipset
details on a previous page) except for the following:
• Does not have an integrated graphics controller (the GMA 4500MHD); instead it requires a
high performance discrete (external) graphics controller using a PCI Express x16 link.

Intel Mobile Processor

667/800/1066 MHz Intel PM45 System memory


Express Chipset

Channel A
PCI Express x16 DDR2 or DD3
2 SDVO ports
533/667/1066 MHz
Intel
HDMI/DVI PM45
MCH
DisplayPort Channel B
DDR2 or DDR3
Discrete graphics 533/667/1066 MHz

DMI
Interface
PCI Express x1 ExpressCard controller

PCI Express x1 Intel Turbo Memory

PCI Express x1

PCI Express x1
USB 2.0 Intel ICH9M
12 ports, 480 Mb/s PCI Express x1 WLAN/WiMax

4 Serial ATA ports PCI Express x1


3.0 Gb/s
LAN
Intel High Definition GLCI
Audio
10/100 LCI
Controller Link 1
LPC interface PCI Bus

Other ASICs
(optional)
Super I/O

TPM

Mobile Intel PM45 Express Chipset Block Diagram

PC Architecture (TXW102) September 2008 117


Topic 4 - Bus Architecture

Mobile Intel GL40 Express Chipset


The GL40 Express Chipset consists of the GL40 Graphics Memory Controller Hub (GMCH) and
the I/O Controller Hub 9M (ICH9M).
The GL40 Graphics Memory Controller Hub (GMCH) is the same as the GM45 chipset except for
the following:
• The GL40 supports the Intel Celeron processors only.
• Supports a system bus at 667 MHz only.
• Supports DDR2 and DDR3 memory at 667 MHz only.
• Supports up to 4 GB maximum addressability (not 8 GB).
• The GMCH has an integrated graphics controller called Graphics Media Accelerator 4500M
(which greatly reduces system cost because a graphics adapter is not needed) with only a 400
MHz core render clock at 1.05 volts.
• Does not support the ICH9M-Enhanced which supports Intel Active Management Technology
(supports the ICH9M only).
• Does not support Intel Active Management Technology (IAMT) 4.0 and
Intel Anti-Theft Technology (AT-p)

Mobile Intel GS45 Express Chipset


The GS45 Express Chipset consists of the GS45 Graphics Memory Controller Hub (GMCH) and
the I/O Controller Hub 9-Enhanced-SFF (ICH9M-Enhanced-SFF).
The GS45 has the same features as the GM45 except it is a smaller physical size (smaller package
size, package height, and ball pitch) than the GM45. Its GMA 4500MHD has a slower 266 MHz
core render clock.

The Mobile Intel GM45 Express Chipset is used in


select ThinkPad T400 notebooks

PC Architecture (TXW102) September 2008 118


Topic 4 - Bus Architecture

Chipsets:
Mobile Intel 4 Series Express Chipset Versions (Notebook)

GL40 GS45 GM45 PM45


Value notebooks Small form factor Mainstream notebooks High performance
Celeron Core 2 processors Core 2 processors Core 2 processors
667 MHz system bus Up to 1066 MHz Up to 1066 MHz Up to 1066 MHz
DDR2 and DDR3 DDR2 and DDR3 DDR2 and DDR3 DDR2 and DDR3
667 MHz 667, 800, 1066 667, 800, 1066 667, 800,1066
Up to 4GB memory Up to 8GB memory Up to 8GB memory Up to 8GB memory
Integrated graphics; Integrated graphics; Integrated or switchable
Discrete graphics
at 400 MHz 266 MHz graphics; 533 MHz
via PCI Express x16
(GMA 4500M) (GMA 4500MHD) (GMA 4500MHD)
ICH9M, ICH9M,
ICH9M ICH9M-SFF-Enhanced
ICH9M-Enhanced ICH9M-Enhanced
July 2008 July 2008 July 2008 July 2008

Mobile Intel GM45


Express Chipset
© 2008 Lenovo

Mobile Intel 4 Series Express Chipset Versions


The key differences among the chipsets are the following:
• The GL40, GS45 supports an integrated graphics controller.
• The GM45 supports an integrated or switchable graphics controller.
• The PM45 supports discrete graphics via a PCI Express x16 graphics controller.

The Mobile Intel PM45 Express Chipset is used in


select ThinkPad W500 notebooks

PC Architecture (TXW102) September 2008 119


Topic 4 - Bus Architecture

Chipsets:
I/O Controller Hub 9M (ICH9M) for Series 4 chipset (Notebook)

• Chip with Mobile Intel Series 4 chipsets for notebooks


• ICH9M-Enhanced supports Intel Active Management Technology
• ICH9M-SFF-Enhanced is a smaller package than the ICH9M-Enhanced for the GS45

ICH9M ICH9M-Enhanced
No Intel AMT support Intel AMT support
GL40, GM45, PM45 GM45, PM45
Six PCI Express x1 ports Six PCI Express x1 ports
Four Serial ATA ports (SATA300) Four Serial ATA ports (SATA300)
Integrated security chip (TPM) Integrated security chip (TPM)
12 USB 2.0 ports 12 USB 2.0 ports
Gigabit Ethernet controller Gigabit Ethernet controller
(MAC layer) (MAC layer)
No integrated RAID Integrated RAID 0 or 1
July 2008 July 2008

I/O Controller Hub 9M


(ICH9M)

© 2008 Lenovo

I/O Controller Hub 9M (ICH9M)


The I/O Controller Hub 9M (ICH9M) is a chip that consolidates many I/O functions needed by
a notebook. This chip is positioned for notebook systems as the M stands for mobile. The
ICH9M and ICH9M-Enhanced are supported on the Mobile Intel 4 Series Express chipsets such
as the GL40, GM45, and PM45 chipsets.

4 Series
chipset
Management
Engine

Direct Media Controller


New ICH9M features from ICH8M
Interface (DMI) Link
• Intel Active Management Technology 4.0
Management
• Integrated security chip (TPM) Engine
added
• One additional SATA 3.0 Gb/s port
• Two additional USB ports
ICH9M

Legacy removal from ICH8M removed


• EIDE support (ATA-100)

ICH9M Block Diagram

PC Architecture (TXW102) September 2008 120


Topic 4 - Bus Architecture

The Intel I/O Controller Hub 9M (ICH89M), ICH9M-Enhanced, and ICH9M-SFF-Enhanced were
announced July 2008. Features of the chip include the following:
• ICH9M-Enhanced and ICH9M-SFF-Enhanced support Intel Active Management Technology
4.0. The Intel Management Engine via the controller link between the Memory Hub and ICH
provides communication for this function.
• ICH9M-Enhanced and ICH9M-SFF-Enhanced support RAID 0 and 1.
• Six PCI Express x1 ports (OEM can configure between one and six x1 slots including one x4
slot). ExpressCard slots are supported through this interface.
• Four Serial ATA controllers to support four Serial ATA (SATA 3.0Gb/s) ports.
• Six USB 2.0 controllers, each supporting two ports and with a maximum of 12 total ports.
• Integrated Gigabit Ethernet controller (LAN Connect Interface (LCI)) MAC layer; it requires
an Intel family PHY layer or compatible chip for complete Ethernet functionality.
• PCI 2.3-compliant controller with support for 32-bit, 33 MHz PCI operations.
• Interface to the memory controller (Intel 4 Series chipset) via the Direct Media Interface
(DMI) link.
• Integrated security chip (Trusted Platform Module); it can be disabled via a strapping option.
• Intel High Definition Audio (Intel HD Audio) interface with support for three external codecs.
• Low Pin Count (LPC) interface for support of a Super I/O chip (security chip, serial, parallel,
keyboard, mouse functions), optional ASICs, and flash BIOS.
• ACPI 2.0 power management logic support.
• Systems Management Bus (SMBus) 2.0 with additional support for I2C devices.
• Enhanced DMA controller, interrupt controller, and timer functions.
• Serial Peripheral Interface (SPI) as an alternative for the BIOS flash device; an SPI flash
device can be used as a replacement for the Firmware Hub.

The ICH9M-SFF-Enhanced is the same as the ICH9M-Enhanced except it is a smaller physical size
(smaller package size, package height, and ball pitch).

PC Architecture (TXW102) September 2008 121


Topic 4 - Bus Architecture

Processor

(G)MCH
Discrete graphics Memory
Intel
Management
Engine

DMI Controller
Interface Link
Intel
Management PCI Express x1 ExpressCard controller
Engine
PCI Express x1 Intel Turbo Memory

PCI Express x1

PCI Express x1
USB 2.0 Intel ICH9M
12 ports, 480 Mb/s PCI Express x1 WLAN/WiMax

4 Serial ATA ports PCI Express x1


3.0 Gb/s
LAN
Intel High Definition GLCI
Audio
10/100 LCI
Controller Link 1
LPC interface PCI Bus

Other ASICs
(optional)
Super I/O

TPM

Intel ICH9M on ThinkPad systemboard


(red box)

PC Architecture (TXW102) September 2008 122


Topic 4 - Bus Architecture

Summary:
Bus Architecture

• ISA, EISA, and Micro Channel are historical


bus architectures.
• PCI has been the common bus architecture
of subsystems and adapters since 1993.
• PCI Express has generally replaced the
PCI bus and slots.
• ExpressCard modules have generally
replaced PC Cards in notebooks.
• USB is the main peripheral interface on
notebooks and desktops today.
• PC Cards, including the faster
CardBus, provide notebooks with
small, removable adapters.
• Desktop and notebook chipsets determine Lenovo ThinkCentre A62
key features and the performance of
systems.

© 2008 Lenovo

Summary: Bus Architecture


Chipsets have a large impact on performance. If all subsystems were equal (memory, graphics,
processor, etc.), a different chipset alone could affect performance by up to 20 percent.

PC Architecture (TXW102) September 2008 123


Topic 4 - Bus Architecture

Review Quiz

Objective 1

1. What 16-bit bus was common a decade ago and is considered the original PC bus?
a. ISA (or AT bus)
b. Micro Channel
c. EISA
d. PCI

Objective 2

2. What is the most common implementation of PCI in desktop and notebook systems?
a. 32-bit at 33 MHz for 132 MB/s
b. 64-bit at 66 MHz for 528 MB/s
c. 64-bit at 100 MHz for 800 MB/s
d. 64-bit at 133 MHz for 1064 MB/s

3. How can a PCI Express adapter be added to a ThinkPad notebook?


a. A ThinkPad notebook supports PCI Express adapters in its docking station.
b. A ThinkPad notebook supports PCI Express adapters via the IDE conversion adapter.
c. A ThinkPad notebook can never support PCI Express adapters.
d. A ThinkPad notebook supports PCI Express adapters in its port replicator.

4. What is the latest version of PCI that also includes support for low profile PCI adapters?
a. PCI 1.0
b. PCI 2.0
c. PCI 2.3
d. SATA300

Objective 3

5. Which statement is false regarding PCI Express?


a. PCI Express is a serial, point-to-point, full-duplex link.
b. PCI adapters are supported in PCI Express slots.
c. PCI Express is scalable starting at 2.5 Gb/s with 1.
d. PCI Express supports hot-pluggability and hot-swap of devices.

6. What is the extension of PCI Express to notebook systems?


a. PCMCIA
b. CompactFlash
c. Secure Digital Card
d. Mini PCI Express

PC Architecture (TXW102) September 2008 124


Topic 4 - Bus Architecture

Objective 4

7. ExpressCard slots require what type of interconnects?


a. Both USB 2.0 and PCI Express ( 1 link)
b. Both PCI and PCI Express ( 1 link)
c. Both PCI Express Mini Card and PCI Express ( 4 link)
d. Both USB 2.0 and PCI Express Mini Card

Objective 5

8. What standard for external devices supports transfers at up to 480 Mb/s or 60 MB/s using small
cables and requires no IRQ settings?
a. Universal Serial Bus (USB)
b. PCI 2.2
c. Point Enabler
d. JEIDA

9. What interface uses ultra-wideband (UWB) radio technology for the connection of home
consumer products to a PC or TV?
a. Wireless USB
b. FireWire 800
c. CardBus Plus
d. Serial Peripheral Interface

Objective 6

10. A CardBus PC Card is based on what specification?


a. ISA
b. PCI
c. Ethernet
d. Universal Serial Bus

11. What is the name of the CardBus slot with a USB 2.0 signal passed to it so that USB-based
ExpressCard modules can be used in a PC Card slot with an adapter?
a. ExpressCard/54
b. CardBus Plus
c. CardBus ExpressCard
d. Serial Peripheral Interface

Objective 7

12. What is the name of the chip that consolidates many I/O functions needed by a system and
interfaces with the Memory Controller Hub?
a. South Bridge
b. XA-32
c. Super I/O
d. I/O Controller Hub (ICH)

PC Architecture (TXW102) September 2008 125


Topic 4 - Bus Architecture

13. Which Intel desktop chipset is used in select ThinkCentre desktops?


a. XA-32
b. 852GM
c. 810
d. Q45

14. Which desktop-based Intel chipset only uses discrete graphics through a PCI Express x16
adapter?
a. G41
b. G45
c. Q45
d. P45

15. What I/O Controller Hub is used with the desktop-based Intel Q45 Express chipset?
a. ICH8
b. ICH9
c. ICH9M
d. ICH10

16. Which desktop chipset with support for a high-performance PCI Express x16 graphics adapter
would be the best choice for the corporate stable segment?
a. Intel X48 Express Chipset
b. Intel Q45 Express Chipset
c. Intel G45 Express Chipset
d. Intel P45 Express Chipset

17. What mainstream desktop chipset has support for integrated graphics controller with Intel Clear
Video Technology and an available PCI Express x16 graphics slot?
a. Intel X48 Express Chipset
b. Intel Q45 Express Chipset
c. Intel G45 Express Chipset
d. Intel P45 Express Chipset

18. What is an important difference between the older notebook I/O Controller Hub 8M (ICH8M)
and the new ICH9M?
a. The ICH9M includes an integrated security chip
b. The ICH9M supports authentication to a WiMax network
c. The ICH9M is integrated in the processor
d. The ICH9M supports two PCI Express x16 slots

PC Architecture (TXW102) September 2008 126


Topic 4 - Bus Architecture

Answer Key
1. A
2. A
3. A
4. C
5. B
6. D
7. A
8. A
9. A
10. B
11. B
12. D
13. D
14. D
15. D
16. B
17. C
18. A

PC Architecture (TXW102) September 2008 127


Topic 4 - Bus Architecture

PC Architecture (TXW102) September 2008 128


Topic 5 - Storage Architecture

PC Architecture (TXW102)
Topic 5:
Storage Architecture

© 2008 Lenovo

PC Architecture (TXW102) September 2008 1


Topic 5 - Storage Architecture

Objectives:
Storage Architecture

Upon completion of this topic, you will be able to:

1. Define disk subsystem functions and terminology


2. List what type of devices used Enhanced IDE (EIDE)
3. Differentiate between the Serial ATA specifications of SATA150
and SATA300
4. List the various factors that affect disk performance

© 2008 Lenovo

Storage Architecture
IBM invented the disk drive in 1956. The first disk drives were the size of two large refrigerators
and held 5 MB. It cost $10,000 per MB.
The platter(s) of disks typically spin about 100 miles per hour.
1 GB of data is equivalent to 1,500 paperback novels. An areal density of 1.44 billion bits per
square inch is equivalent to 87 college textbooks (1 inch=25.4 mm). 5 GB is equivalent to 50 years
of a typical daily newspapers, one million printed pages, and a stack of paper 62 stories tall.
A 500 GB disk holds 125 hours of high-definition video, 178 feature-length non-HD movies, or
125,000 four minute songs.

PC Architecture (TXW102) September 2008 2


Topic 5 - Storage Architecture

Disk Subsystem

• A disk subsystem consists of:


- Disk
- Disk controller Processor +
L1/L2 Cache

• Common disk
controllers PCI Express slots
PCI Express MCH or Memory
- Enhanced IDE (EIDE) x16 slot GMCH
Host Bridge
PCIe
- Serial ATA (SATA)
IDE controller
PCIe
- Serial Attached SCSI (SAS) SATA controller
PCI 2.0 I/O
- Fibre Channel Controller 2 EIDE
Hub disks

SCSI
- iSCSI over 10G ethernet 4 SATA (ICH)
EIDE optical
disks
Firmware USB
PCI 2.0 slots Super I/O Hub

© 2008 Lenovo

Disk Subsystem
The disk subsystem consists of a disk and a disk controller (also called disk interface). Common
disk controllers today include Enhanced IDE (EIDE), Serial ATA (SATA), Serial Attached SCSI
(SAS), Fibre Channel, and iSCSI over 10G ethernet.

Early Disk Interfaces


Three early disk controllers or interfaces used in PCs were ST506, ESDI (Enhanced Small Device
Interface), and IDE.
• ST506 has a maximum transfer speed of 1 MB/s and supports a maximum of two disks. The
adapter contains the formatting, head select, and error detection functions.
• ESDI has a maximum transfer speed of 3 MB/s and supports a maximum of two disks. The
adapter card contains the formatting, head select, and error detection functions.
• IDE (Integrated Drive Electronics) was replaced with EIDE (Enhanced IDE). The original IDE
supported only disks (not CD-ROMs or tapes).

PC Architecture (TXW102) September 2008 3


Topic 5 - Storage Architecture

Disk Subsystem:
Disk Functions

• Disk controller translates


a logical address into a
physical address Average Access Time (6 to 18 ms) =
(cylinder, track, sector) Average Seek Time + Average Latency

• Seek (head moves to Move Seek Arm


correct track)
• Latency (correct sector
to spin under head) Buffer

• Read/write data Wait For Disk


(transfer rate) Rotation

• Data goes into disk buffer


Read/Write
Sector(s)

RPM

© 2008 Lenovo

Disk Functions
The main storage of data in a computer is on a magnetic disk (hard disk or hard drive). A magnetic
disk stores the on and off bits of binary data as microscopic magnetized needles on the surface of
the disk. This data can be recorded and erased any number of times. When computer power is
turned off, the data remains stored on the disk.
Disk speeds are often measured in average access time, which is the sum of the average seek time
and average latency.
Seek time is the time required for the read/write head to be positioned on the correct track of the
disk. Measurements are in milliseconds (ms). A faster seek time yields better performance.
Sometimes the term track-to-track seek time is used. It defines the time that it takes the device to
move the read/write head from one track to another. Typical seek times today vary from 5 ms to 12
ms. For living room entertainment disks, seek time is not critical. A slower seek time will reduce
noise and power.

PC Architecture (TXW102) September 2008 4


Topic 5 - Storage Architecture

Latency is the time (measured in milliseconds) required for the intended sector on a track to come
under the read/write head. Average latency is determined by rotational speed. Usually, the figure for
latency is the time it takes the drive to do half of a rotation. A shorter latency is better for the
performance. Today, seek times are almost at the same level as latency, but latency occurs more
often than seeks, so latency has a bigger influence on the total performance. Latency is becoming the
factor that has the biggest influence on the performance.
Rotation speed is the speed at which the disk rotates. The higher the rotation speed, the faster the data
can be moved to and from the disk. Measurements are in revolutions per minute (rpm). The rotation
speed determines the latency of the drive. Common rotation speeds of a disk is 5400, 7200, 10,000,
and 15,000 rpm. An increase in rotation speed reduces latency and improves seek time. The
maximum rotation speed of a device of a certain size is limited. For example, a 3.5-inch device can
run at about 10,000 rpm at the most before the reliability of the device becomes questionable. To get
a higher speed, a smaller device form factor is needed. A 2.5-inch disk is able to have a much higher
speed than a larger disk.
• 4200 rpm disks have a 7.1 ms latency
• 5400 rpm disks have a 5.6 ms latency
• 7200 rpm disks have a 4.17 ms latency
• 10,000 rpm disks have a 2.99 ms latency
• 15,000 rpm disks have a 2.00 ms latency
Sustained transfer rate (STR) is the average rate at which the hard disk makes data available to the
host system. For example, the IBM Ultrastar 36Z15, which is an Ultra160 SCSI disk, has an STR
between 36.6 and 52.8 MB/s. The IBM Ultrastar 146Z10, which is an Ultra320 SCSI disk, has a STR
between 33.9 to 66.7 MB/s. STR assumes the disk is always available. STR is sometimes called the
internal formatted transfer rate.
Media transfer rate (MTR) is faster than sustained transfer rate. It determines how fast a device can
move data between the media and its sector buffer. The MTR depends on the rotation speed and on
the density with which data is stored. (If a sector takes up less space on the disk, it can be read
faster.) It is usually measured in megabytes per second (MB/s). MTR is usually the bottleneck in the
data transfer rate of the disk subsystems of today. Compared to the speed of the system bus of the
host system and the speed for transporting data across the cabling, this rate is the lowest.

PC Architecture (TXW102) September 2008 5


Topic 5 - Storage Architecture

Disk Components
A disk is made up of several components, each with parts of its own. These components can include:
• Head disk assembly (HDA), which consists of:
– A number of disks on which data is stored
– A motor to rotate the disks
– A head to read and write data (The heads "fly" over the surface of a disk at a height of two to
three micro inches; a human hair is 3000 micro inches.)
– An actuator to move the heads over the disks
• Printed circuit board (PCB), which consists of:
– A microprocessor unit to control the operation of the HDA and to communicate with the disk
subsystem controller
– Servo circuitry to position the actuator exactly
– A read/write channel to transform the electrical signals between a format that can be accepted by
the host system and a format that can be accepted by the disk
– A disk subsystem cable connector

Platter
Track
Cylinder
Sector

Each disk is segmented into tracks. Tracks are about 300 millionths of an inch apart. Each track is
divided into sectors. A sector is the smallest addressable unit on a direct access storage device and
usually contains 512 bytes of data. The figure shows the layout of a disk; the example shows a disk
with five tracks. Each track usually contains at least 32 sectors, but the number of sectors per track
varies.
A cylinder is made up of all the tracks that are at the same location of each platter in the device. The
figure shows a device that has four disks. The white rings in each platter indicate a specific track.
The white rings compose a cylinder.
When data that is larger than one track is read or written to a device that has multiple platters, a head
change is required to continue reading or writing. Although changing to another head can be done
quickly, the time required may be long enough to miss the first sector of the track on the next platter.
If the first sector is missed, the head has to wait almost a complete rotation before it can continue. To
prevent this situation, a technique called track skewing can be implemented. Today's devices are fast
enough to be able to use track skew of one or two.

PC Architecture (TXW102) September 2008 6


Topic 5 - Storage Architecture

When data that is larger than one cylinder is read or written, the read/write head(s) need(s) to be
moved to an adjacent cylinder to continue reading or writing. Although moving to another cylinder
can be done quickly, the elapsed time might be long enough to miss the first sector of the track on the
next cylinder. If the first sector is missed, the head will have to wait almost a complete rotation before
it can continue. To prevent this situation, cylinder skewing can be implemented.
Areal density is defined as how tightly information is packed together on a medium. Increasing
capacity per platter results in fewer parts, lower power consumption, lower heat, and lower sound
generation. Increasing areal density increases performance in that the head reads bits quickly as more
pass under the head in the same amount of time; a lower speed disk could outperform a higher speed
disk.

A Track Skew of 1 2 3 A Cylinder Skew of 2 Track 1


2 3
1 4
Track 1 on Platter 3 2 3 4 5
8
1 5
4
1 3 6 4
Track 1 on Platter 2 2 3
7 6
1
1 6
4 2 7
Track 1 on Platter 1 8 5
8 7
2 7 1 8
Track 2
1 8 7 6

As shown in the diagram, when track skewing is implemented, the first sector of the next disk is
located at a further location on the platter. When sector 8 of track 1 on platter 1 has been read, the next
sector to be read is sector 1 of track 1 on platter 2. In the example, the disk subsystem has as much
time to perform the head change as it takes the disk to rotate one sector.
When cylinder skewing is implemented, the first sector of the next cylinder is located at a further
location on the disk. When sector 8 of track 1 has been read, the next sector to be read is sector 1 of
track 2. In the example, the disk subsystem has as much time to move the head to track 2 as it takes
the disk to rotate two sectors. Today's devices are fast enough to be able to use a track skew of one or
two.
The following terms are often used to describe availability:
Mean time data loss (MTDL) defines the average time until data loss occurs. When data loss occurs,
the lost data is no longer available to the end user.
Mean time between failures (MTBF) is the average time (measured in hours) between consecutive
failures of a component. It is a term often misinterpreted as a figure indicating the average lifetime of
a component.

PC Architecture (TXW102) September 2008 7


Topic 5 - Storage Architecture

Servo information is magnetic patterns on the disk platter that the disk uses to position the read/write
heads accurately at the proper location on the platter for read/write operation.
Disks with encoded servo design place servo information inside the data sectors (on all the tracks).
Doing so eliminates constraints of storing servo information between data sectors (sometimes called
embedded servo). The older dedicated servo technique reserves an entire side of a platter for only
servo information.
Zone-bit recording (or zoned recording) allows the device to better use the available disk space by
adding more sectors to the outer tracks of a device in order to store more data.

Zone-bit recording has more sectors


on the outer tracks

3 4
2 5
2 3
1 1 4 6
Zone-Bit Recording
12 8 5 7

11 7 6 8
10 9

PC Architecture (TXW102) September 2008 8


Topic 5 - Storage Architecture

Disk Subsystem:
Disk Terminology

• Perpendicular magnetic recording Hard disks are main cause


for hardware failure
• ThinkPad Roll Cage
Memory
CPU 4% 4%
• Hard Disk Drive Shock Absorber Fan 5% 3%
1%
8%
• Drive fitness test (DFT)
• Automatic defect reallocation Power
supply
• Predictive Failure Analysis 25%
Hard drives
50%
• Adaptive Battery Life Extender
• No-ID

Serial ATA Disk


for desktop
© 2008 Lenovo

Disk Terminology
Perpendicular magnetic recording (PMR) – Introduced in 2006, perpendicular magnetic recording
(PMR) is a new recording method that aligns the magnetized bits perpendicular (vertically at 90
degrees) to the surface in contrast to the longitudinal (horizontal) parallel method. PMR essentially
aligns the magnetized particles like dominoes standing on end. PMR increases the areal density by
as much as 10 times so disks can store more data on every platter. PMR requires the development
of new disk media, heads, and electronics.

Media surface

S N N S S N N S

Conventional longitudinal (horizontal) recording technology

Media surface
S N S N S N S N

N S N S N S N S
Additional layer

Perpendicular (vertical) magnetic recording technology

PC Architecture (TXW102) September 2008 9


Topic 5 - Storage Architecture

• ThinkPad Roll Cage – The ThinkPad Roll Cage is a protective magnesium frame featured on select
ThinkPad notebooks that provides shock protection to the disk and other notebook components.

Hard
Drive
Magnesium Frame Protected
by HDD
Protection
Pack
System
Planar
ThinkPad Roll Cage Protection

• Hard Disk Drive Shock Absorber – The Hard Disk Drive Shock Absorber was introduced in select
ThinkPad notebooks in 2003. It helps to absorb some shock to the disk by allowing the connector
to float in a range of 0.5mm.

A metal protection plate


Floating
on every hard disk drive upper half
helps protect against
Floating upper portion to
physical shock as well
connect with HDD
as static electricity and
dirt that can damage the
drive electronics Lower portion
soldered on planar
Lower half
soldered on planar

Clearance
0.5 mm
Up
Up!

Clearance
Down
Down! 0.0 mm

PC Architecture (TXW102) September 2008 10


Topic 5 - Storage Architecture

• Drive Fitness Test (DFT) is a technology that uses a PC-based program that accesses special disk
microcode and enables OEM system manufacturers and service providers to accurately diagnose
the proper operation of hard disks. DFT is designed to address problem situations in which end
users suspect hard drive malfunction. The DFT program can be integrated into the system
diagnostic package and preloaded by system OEMs into a special, protected partition on the hard
drive. DFT can then be invoked by the end user (for example, by pressing Ctrl+Alt+X), possibly at
the direction of the system OEM telephone support staff. DFT is supported in both IDE and SCSI
disks.
DFT microcode automatically logs significant error events, such as hard errors and a history of all
reassigned sectors; this log is kept in a reserved area of the drive. Also, DFT microcode performs
mechanical analysis of the disk in real time. Parameters such as disk shift, servo stability, and
repeatable runout (RRO) can be calculated dynamically by reading the position error signal (PES)
of the servo and analyzing the patterns in the PES. It also uses SMART to predict imminent failure.
DFT software is stand alone in that it runs under DOS in a manner independent of the end-user
operating system.
• Automatic defect reallocation identifies and remaps defective sectors with good sectors in real
time.
• SMART (Self-Monitoring, Analysis, and Reporting Technology) is an industry-standard
specification for disks that allows the monitoring of disks for reliability and impending disk drive
failures. (It is similar to Predictive Failure Analysis; PFA is a superset of SMART). Disks meeting
this specification monitor such factors as spindle performance and error rates. Software can
interrogate the disk at any time to see if any error conditions exist. SMART cannot predict disk
failures caused by electronic connectors and integrated circuits. Approximately 60 percent of disk
failures are predictable.
• The look-ahead buffer reads additional data ahead of the data currently requested and stores it in
the fast buffer memory.
• The segmented look-ahead buffer divides the total amount of buffer memory into smaller buffers
so that data from more than one read can be stored at a time.
• Adaptive buffering allows the disk to adjust the number and size of the buffer segments when the
disk logic determines that the buffer hit rate can be increased.
• Write caching uses the disk buffer for writes (and reads) in order to increase throughput. The disk
signals completion of the write when it is received in the buffer and before it is written to the disk.
The system then does other work while the disk writes the data.
• An asynchronous device must acknowledge each byte as it comes from the controller. Synchronous
devices may transfer data in bursts, and the acknowledgments happen after the fact. The latter is
significantly faster than the former, and most newer devices support this mode of operation. The
adapters negotiate with devices on the SCSI bus to ensure that the mode and data transfer rates are
acceptable to both the host adapter and the devices. This process prevents data from being lost and
ensures error free data transmission.
• Hot-swap disks can be removed and/or inserted without tools while the system is powered on. An
example of a server with multiple hot-swap bays for hot-swap disks is shown. Most servers have
hot-swap bays for hot-swap disks. Disks can be hot-swappable with appropriate connector,
enclosure, bus, and controller support.

PC Architecture (TXW102) September 2008 11


Topic 5 - Storage Architecture

• Some commands take a relatively long time to complete (for example, a seek command takes
roughly 10 ms). With this feature, the controller can disconnect from the bus while the device is
positioning the heads (seeking). When the seek is complete and data is ready to be transferred, the
device arbitrates for the bus and reconnects with the controller in order to transfer the data. This
process allows a more efficient use of the available SCSI bandwidth. If the controller held onto the
bus while waiting for the device to seek, the other devices would be locked out. This process is
sometimes referred to as overlapped operations or multithreaded I/O on the SCSI bus.
• Notebook disks are typically 2.5 inches wide (but small notebooks are using 1.8 inch, 1.0 inch, or
0.85 inch disks) and are generally classified by one of the following three heights:
– 17 mm
– 12.5 mm
– 9.5 mm

Lenovo ThinkPad 160 GB 5400 rpm Hard Drive (9.5 mm high)

Lenovo ThinkPad Disk


(removable by pulling tab)

Metal plate on the bottom of the hard disk protects the


drive from shock, static electricity, and dirt.

PC Architecture (TXW102) September 2008 12


Topic 5 - Storage Architecture

Toolless disk caddy is easily removable from select ThinkCentre desktops.

Plastic is used for the disk rail on select ThinkCentre desktops that is thicker and
stronger than flimsy metal for added durability.

ThinkCentre desktop disks use vibration dampening mounts for


reduced noise, for added reliability, and for longer disk life.

ThinkCentre desktop disks use pins instead of screws to attach


disks to rails; no screwdriver is needed.

PC Architecture (TXW102) September 2008 13


Topic 5 - Storage Architecture

Disk Subsystem:
Active Protection System

• The Active Protection System is a motion


sensor and software utility system that
protects the hard disk drive from damage
due to a fall or rough handling
- Parks disk within 500 milliseconds
• Designed to prevent data loss and
downtime, it monitors system movement
to avoid hard disk drive crashes
• Hardware component
Protects Hard Disk from a Fall
- Motion detector (accelerometer)
embedded in systemboard
• Software component
- Receives and interprets signals from
accelerometer
- Signals hard disk drive to stop when
rapid system motion or vibration
detected
• Utilized on select ThinkPad notebooks
Pop-up window when Active
Protection System activated
© 2008 Lenovo

Active Protection System


Introduced in 2003, the ThinkVantage Active Protection System is an integrated, user-configurable
motion sensor that continuously monitors select ThinkPad notebooks. The Active Protection
System temporarily parks the drive head and shuts down all disk activity to help prevent some hard
disk drive crashes when a fall or similar event is detected. When repetitive vibration is detected, as
in trains or autos, the system automatically adjusts sensitivity, so your work is uninterrupted. Active
Protection System provides up to four times greater impact protection than systems without this
feature.

Pop-up window from the applet


icon in system tray

PC Architecture (TXW102) September 2008 14


Topic 5 - Storage Architecture

The Active Protection System contains both a hardware and a software component. The hardware
(accelerometer) detects acceleration of the system where speed is increasing, or accelerating. This
hardware signals the software component identifying movement. The accelerometer chip measures
the tilt in the system and uses the information that comes from the tilt to try and predict when a
shock event will occur. The accelerometer has an absolute maximum shock rating of 3000 Gs.
The software component is the thinking portion of the protection system. This software makes a
decision on whether the movement is potentially harmful to the system or if it is normal, repetitive
motion. It decides whether or not to turn off the hard disk drive.
If a ThinkPad system is going to fall off a table, it will tilt before it falls. The accelerometer will
detect the tilting and falling, then the software will interact to protect the hard disk drive by parking
the hard disk drive head within 500 milliseconds.
At any time, you can see the status of the Active Protection System through the real-time on-screen
status under the Properties window. This feature allows the customer to see a graphical picture of
the state of their Active Protection System.

Configuration Options for View of Real-time Status for


Active Protection System Active Protection System

PC Architecture (TXW102) September 2008 15


Topic 5 - Storage Architecture

Disk Subsystem:
Flash Memory with Disks

• Flash memory starting to appear with disks as a disk-cache accelerator


• Flash memory is faster than the slower disk
- Booting the OS could be 15 seconds faster
• Provides power savings since disk rotates less often
- Battery life in notebooks could be extended 8%
- Overall performance gain (especially for Windows Vista)
• Two methods
- Integrated on disk (hybrid disk drive)
- External to disk (Intel Turbo Memory uses
Mini PCI Express)
• Windows Vista
- ReadyBoost (read cache flash memory)
- ReadyDrive (write cache flash memory)
Flash memory on a Mini
PCI Express adapter
(Intel Turbo Memory)

© 2008 Lenovo

Flash Memory with Disks


To address the slow performance of disk drives, flash memory (primarily NAND flash) is starting
to become part of the disk subsystems. The disk drive is the slowest component in a system
because of the rotating platters (only important subsystem with moving parts). Flash memory can
average about 3 millisecond (ms) performance while a typical disk may need up to 20 ms.
One implementation is to have the flash memory incorporated on the disk itself (often called hybrid
disk). The hybrid disk drive can have a 1 GB NAND-based flash memory used as both a boot
buffer and write buffer. In hybrid write mode, the mechanical drive is spun down the majority of
the time, while data is written to the flash memory. When the write buffer of the flash memory is
full, the rotating disk spins and data is written to the disk.
Another implementation is to have the flash memory incorporated external to the disk. For
example, Intel promotes having flash memory on a Mini PCI Express card (in notebook systems)
called Intel Turbo Memory. This requires an Intel control ASIC chip, software driver, and Intel
NAND flash. It can be used with any Serial ATA disk.
Other benefits of flash memory with disks are the reduction of the possibility of shock and impact
damage. Also the disk rotates less often so power is saved which is extremely important for
notebook systems.

PC Architecture (TXW102) September 2008 16


Topic 5 - Storage Architecture

A Windows Vista feature, ReadyBoost, is a read cache that allows Windows to cache memory pages
onto flash memory (such as on a USB drive, Secure Digital Card, Compact Flash, Intel Turbo
Memory) that will not fit into main memory. Because a removable flash device could be removed at
anytime, unique data cannot be stored on it, and data is encrypted for security reasons.
ReadyBoot uses the ReadyBoost services to speed up the boot processes and recover from
hibernation by building a temporary cache of the main files needed during boot.
The final solution is ReadyDrive, a write cache that can cache portions of Windows Vista to facilitate
faster boot up and resume times. 30% boot time savings can be expected using ReadyDrive and
during normal operations, data retrieved from the cache will be transferred two to three times as fast
as from a disk.

PC Architecture (TXW102) September 2008 17


Topic 5 - Storage Architecture

Disk Subsystem:
Full Disk Encryption (FDE)

ƒ Hardware-based encryption
built on the disk
ƒ Better performance than
software-based encryption
ƒ Secures all data by
encrypting every bit of data,
including the OS, swap
space, and temporary files
ƒ Encryption transparent to
user
ƒ Encryption keys are bound to
hard disk password Lenovo ThinkPad
200GB FDE Disk
ƒ Requires system to have
BIOS support
Security with
encryption
© 2008 Lenovo

Full Disk Encryption


A Full Disk Encryption (FDE) disk encrypts all data on the disk via hardware that is integrated on
the disk. This is in contrast to software-based encryption that could provide more processor cycles
for encryption (up to 20% better performance than software-based encryption).
A Full Disk Encryption (FDE) disk uses a government-grade security protocol to encrypt all stored
data, even temporary files. The encryption can not be turned off, so a user can not bypass
encryption. The read/write performance is equivalent to a non-encrypted disk.
The encryption keys are bound to a hard disk password which can be linked to a fingerprint reader.
Even in systems with a Trusted Platform Module (TPM) or security chip, the FDE encryption keys
are not stored on the security chip.
A Full Disk Encryption (FDE) disk uses the same disk space for data as non-encrypted data.
Lenovo offers various FDE disks for select Lenovo ThinkPad systems. These disks require
appropriate BIOS support so the disks will not function in every notebook. Pre-boot authentication
is supported, so a fingerprint reader can be used as the authentication method instead of a
password.

PC Architecture (TXW102) September 2008 18


Topic 5 - Storage Architecture

Disk Subsystem:
Solid State Drives

• Disk storage using


NAND flash memory
• Benefits:
- Faster reads than disks
- Lighter and smaller
- More shock resistance
- Less power and heat
- Enables longer battery life
• Currently costly, but prices
will fall
• Used in Lenovo ThinkPad X300
• Lenovo offers 128 GB SSD as
accessory for select ThinkPad 128 GB Solid State Drive
notebooks with and without 2.5" cover
(part number 43N3406)
© 2008 Lenovo

Solid State Drives


Solid State Drives (SSD) are disk drives with no moving parts to spin up, drain the battery, break,
make noise, or add weight. SSD do not have rotating platters like traditional disks.
Flash memory is a type of physical storage that has no moving parts and is very fast, yet keeps its
contents when electrical or battery power is off. Currently widespread in cell phones and consumer
gadgets of all types, for PC users it is widely used as USB-based memory keys. Solid State Drives
use NAND (which stands for "Not and") flash memory that excels at reading, writing, and erasing
data.
There are many advantages to Solid State Drives. In real-world Windows-based applications, SSDs
are 25% faster than disks with rotating platters. Weight and size are also important differentiators,
with a 1.8" wide SSD weighing less than half that of a typical 2.5-inch notebook drive. SSDs have
increased shock resistance, with the ability to take up to 1500Gs compared to around 200Gs by a
traditional disk drive (an 8 foot drop versus a 1 foot drop). Solid State Drives draw less power and
produce less heat than common disks, using 0.5 watt while active and 0.1 watt at idle. Using these
drives, overall battery power consumption in notebooks can be reduced up to 9% once other
power-draining devices in a notebook are factored in.
Cost is the main drawback with SSDs. However, this cost differential between SSD and traditional
disk drives is expected to decrease rapidly over time. The silent operation, light weight, increased
shock resistance, and low level of power consumption of SSDs are making them increasingly
desirable.

PC Architecture (TXW102) September 2008 19


Topic 5 - Storage Architecture

2.5" SATA HDD 2.5" SATA SSD


Mechanism type Magnetic rotating platters Solid NAND Flash-based

Density 80 GB 64 GB

Weight 365g 73g

Performance Read: 59 MB/s Read: 100 MB/s


Write: 60 MB/s Write: 80 MB/s

Active power consumption 3.86 W 1W

Operating vibration 0.5G (22-350 Hz) 20G (10-2000 Hz)

Shock resistance 300G/2.0 ms; 160G/10 ms 1,500G/0.5 ms

Operating temperature 5ºC to 55ºC -25ºC to 85ºC

Acoustic noise 0.3 dB 0 dB

Endurance MTBF < 700k hours MTBF > 2M hours

Comparison of Hard Disk Drive to Solid State Drive

SSD Performance
SSD drives are good at random read operations which is the basis for claims that boot times and
program loads for some applications are faster. However, the write process for SSD drives is
complicated because large blocks must be erased before data can be written. SSD wears out with
many writes, so the controller in an SSD must move data around so all of the memory locations see
similar numbers of writes.

SSD Penetration Rate Projection

Total notebook shipments, SSD


millions of units penetration rate

2007 112 1%

2008 146 5%

2009 175 15%

2010 201 30%

2011 229 38%


Source: American Technology Research
Penetration of solid state drives (SSDs) in notebook PCs seen taking off.

PC Architecture (TXW102) September 2008 20


Topic 5 - Storage Architecture

Disk Subsystem:
Serial Attached SCSI (SAS)
• Serial (not parallel) link
• Used in workstation and servers SATA SCSI/SAS
mostly with disks 1.5 Gb/s
3 Gb/s
3 Gb/s
- Large capacity 15K rpm disks
No Dual Port Dual Port
• Supports SATA and SAS devices Half Duplex Full Duplex
• Uses the SCSI command set 7200 rpm 10,000 – 15,000 rpm
• Many cabling options ƒ Low duty cycle ƒ Highest duty cycle
ƒ 10x5 operation ƒ Server and
• Used in select Lenovo networked storage
ThinkStation systems
ƒ 24x7 operation
ƒ 3.5
inch and
ƒ 3.5 and 2.5 inch
mobile SATA

SAS PCIe x4 Adapter

© 2008 Lenovo

Serial Attached SCSI (SAS)


Serial Attached SCSI (SAS) is serial communication protocol primarily designed for transfer of
data to and from devices such as disks (other devices are also supported). It is designed for the
corporate and enterprise market as a replacement for parallel SCSI, allowing for much higher speed
data transfers than previously available, and is backwards-compatible with SATA drives. Though
SAS uses serial communication instead of the parallel method found in traditional SCSI devices, it
still uses SCSI commands for interacting with SAS End devices. It is primarily used in
workstations and servers.

Parallel SCSI SAS


Architecture Parallel, all devices connected to shared bus Serial, point-to-point, discrete signal paths

Performance 320 MB/s (Ultra320 SCSI); performance degrades 3.0 Gb/s, roadmap to 12.0 Gb/s; performance
as devices added to shared bus maintained as more drives added

Scalability 15 drives Over 16,000 drives

Compatibility Incompatible with all other drive interfaces Compatible with Serial ATA (SATA)

Max. cable length 12 meters total (must sum lengths of all cables 8 meters per discrete connection; total domain
used on bus) cabling thousands of feet

Cable form factor Multitude of conductors adds bulk, cost Compact connectors and cabling save space, cost

Hot pluggability No Yes

Device identification Manually set, user must ensure no ID number Worldwide unique ID set at time of manufacture;
conflicts on bus no user action required

Termination Manually set, user must ensure proper installation Discrete signal paths enable devices to include
and functionality of terminators termination by default; no user action required

PC Architecture (TXW102) September 2008 21


Topic 5 - Storage Architecture

Enhanced IDE (EIDE)

• Enhanced IDE is a controller in the


I/O Controller Hub
• Supports disks and non-disk devices,
such as optical and tape drives
• Supports 4 EIDE devices Device 2
Device 2

- EIDE devices require 2 cables


connected to 2 connectors Device 1

Device 1
- Do not mix slow and fast devices
on the same cable
• Typically 100 MB/s data transfer speed
and called ATA-100
2 Connectors for IDE
Devices

© 2008 Lenovo

Enhanced IDE (EIDE)


Enhanced IDE (Integrated Drive Electronics or EIDE) is the current standard for inexpensive, high
performance disks in PCs. EIDE is also used interchangeably with the term ATA or AT
Attachment (Advanced Technology Attachment).
EIDE is a bus with a chipset controller and devices (like disks) which each have a controller to
interface with the bus. The devices could be disks, CD-ROMs, DVD drives, CD-RW drives, some
tape drives, and other devices. Diskette drives do not use this interface. The EIDE controller
supports two EIDE buses of up to two devices each. The EIDE controller is typically in the South
Bridge or I/O Controller Hub chipset. This chipset has multiple functions in addition to the EIDE
controller functions.
The two devices off a single EIDE cable should be the same speed. If a slower device (supporting a
slower mode) is mixed with a faster device (supporting a faster mode), both devices will transfer at
the slower speed.
EIDE employs two error detection and correction facilities:
• Data is stored on a disk using error correction code (ECC) bytes. During a read of the data, the
ECC bytes are used to check the integrity of the data.
• Physical disk addresses are verified by cyclic redundancy code (CRC) bytes that are stored as
part of each sector. During a read of the data, the CRC bytes are used to verify that the correct
data is read.

PC Architecture (TXW102) September 2008 22


Topic 5 - Storage Architecture

The ATA Packet Interface (ATAPI) is a standard designed for devices such as CD-ROMs and tape
drives that plug into an ordinary ATA (IDE) port. ATA-4 was released in 1998 and added a queuing
function that allows disks to take multiple commands and perform them in the sequence that is most
mechanically efficient.
The maximum cable length for IDE and Enhanced IDE is 18 inches. This restriction confines IDE
devices to be internal only.
On each IDE connector, one IDE device is the primary (master), and the other IDE device is the
secondary (slave). The primary/secondary designation is determined by switch or jumper settings on
each IDE device.
There is no performance impact between a master and slave of the same type on the same IDE
connector. A bootable IDE disk can be a master or a slave.

Primary
IDE Primary IDE Cable
Interface
Disk Subsystem Controller Disk Subsystem Controller

Disk Disk

Master Device (Device 0) Slave Device (Device 1)

Secondary
IDE Secondary IDE Cable
Interface
Disk Subsystem Controller Disk Subsystem Controller
Host
System CD-ROM Device Tape Device

Master Device (Device 0) Slave Device (Device 1)

Typical EIDE Configuration

PC Architecture (TXW102) September 2008 23


Topic 5 - Storage Architecture

Enhanced IDE (EIDE):


EIDE in Notebooks and Desktops

• EIDE fastest rate is ATA-133 with up to 133 MB/s transfers


• Desktops
- 2006: opticals migrated from EIDE to Serial ATA
- 2005: disks migrated from EIDE to Serial ATA
• Notebooks
- 2008: opticals migrated from EIDE to Serial ATA
- 2006: disks migrated from EIDE to Serial ATA

Lenovo ThinkPad 80GB


5400 rpm Enhanced IDE Disk
© 2008 Lenovo

EIDE in Notebooks and Desktops


New desktops and notebooks have migrated from EIDE to Serial ATA for both disks and opticals.
However, EIDE devices are common in the existing installed base of desktops and notebooks.

PC Architecture (TXW102) September 2008 24


Topic 5 - Storage Architecture

ATA-66
In 1998, Quantum announced the ATA-5 (ATA/ATAPI-5, Ultra ATA/66, ATA-66, and Ultra DMA
Mode 3 or 4) interface, which supports up to 66 MB/s transfers. Chipset support began in 1999. The
interface provides enhanced data integrity with improved timing margins and use of CRC data
integrity. It uses a 40-pin, 80-conductor cable instead of the previous 40-pin, 40-conductor cable to
lower electromagnetic interference. The additional 40 conductors in the 80-conductor cable are
dedicated to "signal ground" functions to reduce signal cross-talk. Yet it retains the 40-pin connector,
which has been used on PCs since the original IDE connector. Aside from the different cable, the
new standard is fully backward compatible with previous IDE, EIDE, or ATA drives. A system with
an ATA-66 interface can read and write any IDE, EIDE, or ATA drive that has been made. Similarly,
a drive with an integrated ATA-66 controller can be used on older IDE interfaces, through drives
attached to slow controllers will slow to the slowest speed. Older cables may still be used with an
ATA-66 controller; however, only Ultra DMA/33 mode or slower operation is supported; this can be
detected by a new cable sense pin, which is detected by systemboard logic.
80-conductor cables have color-coded connectors. A two-drop cable has a black connector for the
master device, a gray connector for the slave device, and a blue connector for the systemboard. A
one-drop cable has a black connector for the master device and a blue connector for the systemboard.
Because ATA-66 has a faster transfer rate over the cable, having the master and slave in optimum
position reduces the risk of errors from reflection and delays for two devices on the cable.
The specification implements two modes: Ultra DMA mode 3, providing up to 44 MB/s transfer rate,
and Ultra DMA mode 4, providing 66 MB/s.

Systemboard Connector for ATA-66 Cable

ATA-100
In 2000, the ATA-100, ATA/100, or Ultra ATA/100 specification was released which supports 100
MB/s transfers through the IDE interface. It requires the same 40-pin 80-conductor cabling as ATA-
66. It basically has the same features as ATA66, and it is backwards compatible to ATA-33 and
ATA-66. Both the disk and the controller (such as the ICH2) need to support ATA-100 for the 100
MB/s speed.

PC Architecture (TXW102) September 2008 25


Topic 5 - Storage Architecture

ATA-133
In 2002, Maxtor introduced ATA-133 which provides 133 MB/s data transfer rates. ATA-133 is
not expected to gain wide acceptance because Intel and many leading disk and chipset vendors are
focusing on the expected successor to ATA-100, which is Serial ATA. ATA-133 disks became
available in 2002, but Intel-based chipset systems require an add-in PCI adapter with an ATA-133
controller. Via Technologies supports ATA-133 in some of its chipsets.

Toolless EIDE Disk Removal on ThinkCentre S50;


Drive removes from cradle without tools

Drive cages are tool-less and can be removed from the


chassis in ThinkCentre small desktop mechanical.

PC Architecture (TXW102) September 2008 26


Topic 5 - Storage Architecture
Serial ATA:
SATA150 or SATA 1.5 Gb/s

• Replacement for EIDE for disks, optical drives, etc.


• Serial ATA 1.0 is 150 MB/s or 1.5 Gb/s (called SATA150 or SATA 1.5 Gb/s)
• Supported in Intel ICH5 and higher controllers
• Serial (1-bit data path), point-to-point dedicated link
• Thin, 1-meter cable length; one device per cable
• Fixed or hot-plug support
• Advanced Host Controller Interface
(AHCI) is the formal spec that
standardizes Serial ATA controllers Serial ATA
Controller

Serial ATA Disk

Power Cable

Serial ATA Cable


Serial ATA Connector
© 2008 Lenovo

Serial ATA - SATA150


A specification, called Serial ATA 1.0, Serial ATA (SATA), SATA150, or SATA 1.5 Gb/s, was
introduced in 2001 as an evolutionary replacement for Enhanced IDE or Parallel ATA (such as
ATA-100). SATA controllers are implemented as PCI adapters or in I/O Controller Hub
chipsets. SATA supports any compatible device including disks, CDs, DVDs, other optical
drives, and tape.
Serial ATA Features
Benefits
(not in EIDE)
Point-to-point configuration Eliminated bus sharing overhead
Road map starts at 150 MB/s Performance now; up to 600 MB/s in future
Additive device performance Full bandwidth to each drive
1 meter cable length Enables scalability
Hot-plug drives Quick and easy drive replacement
Inexpensive disks Lower system cost
Command queuing Quick access to data
CRC Strong data integrity

The base SATA transfers at 150 MB/s or 1.5 Gb/s over a 7-pin interface. SATA is clocked at
750 MHz with two samples per clock so it transfers at double data rate on rising and falling
edges of clock. Because the data width is one bit, this feature provides a data rate of 1500
mega-samples or 150 MB/s bandwidth. The specification also defines a 300 MB/s or 3 Gb/s
(3,000 MHz clocking) followed by a 600 MB/s or 6 Gb/s (6,000 MHz clocking) transfer. It
supports full duplex operation (support to send and receive data at the same time); IDE only
supports half duplex transmission.

PC Architecture (TXW102) September 2008 27


Topic 5 - Storage Architecture

To ensure rapid adoption, SATA products are 100% software compatible with the existing
ATA protocol and current operating systems, so no new changes or drivers are needed to
existing operating systems. SATA devices can be mixed with EIDE (parallel ATA) devices in
the same system.
SATA is primarily for inside-the-box drive connections with a maximum cable length of one
meter. The cables are thinner (about 8 mm wide) than EIDE cables so they are smaller and
more flexible. The smaller cables allow better ventilation, access, and visibility inside a system.
SATA uses point-to-point connectivity for significant performance and reliability advantages
over the shared connectivity approach employed by both the ATA and SCSI parallel interfaces.
Each port on a Serial ATA controller serves just one device; that is, the controller
communicates with a given drive only through the port where it is connected. Any SATA
device is treated as a master device, so there are no jumper settings or slave devices. Because
there is no sharing of the bus, each drive can communicate directly with the system at any time.
As a result, the entire available interface bandwidth is dedicated to each device. This dedicated
link approach eliminates the arbitration delays sometimes associated with shared bus
topologies. With a shared bus approach, overhead increases as drives are added to the shared
bus. This means that, in a typical ATA or SCSI RAID system, adding a disk will increase the
total system throughput by some amount less than the throughput of the disk. With Serial ATA,
on the other hand, each added disk can deliver its maximum throughput. Point-to-point
connectivity offers the added benefit of simpler configuration. Dedicated links make a Serial
ATA RAID system easy, fast, and relatively inexpensive to set up. Less complex trace runs on
systemboards permit smaller systemboards.
Cyclical Redundancy Checking (CRC) error detection is standard in SATA as each protocol
layer has the capability to identify errors and can perform recovery and control actions as well
as forward information to the next higher layer in the stack.
SATA supports hot-plugging, the ability to swap out a failed disk drive without having to
power down the system or reboot. This capability contributes to both data availability and
serviceability, without any associated downtime. The Serial ATA 1.0 specification requires
staggered pins for both the hard disk drive and drive receptacles. Staggered pins mate the power
signals in the appropriate sequences required for powering up the hot-plugged device. These
pins are also specified to handle in excess of the maximum allowed inrush current that occurs
during drive insertion.

Serial

2.5”

power data

Serial
3.5”

power data

EIDE ATA-100
3.5”

EIDE ATA data 4-pin power

Serial ATA Device Connector Sizes and Locations

PC Architecture (TXW102) September 2008 28


Topic 5 - Storage Architecture

Serial ATA Power Cable (Left) and Signal Cable (Right) Serial ATA Hot-plug Disk
The basic Serial ATA connector design is a remarkably efficient and practical design offering a
number of notable features/benefits:
• Plugs are blind-mated (can plug them in blindfolded without making an error).
• The “L” shaped Serial ATA data and power connectors make plug orientation very obvious
to the end user, and prevent incorrect mating.
• The extrusion has “ears” which guide and align the plug during the mating process.
• The conductors are engineered for hot-plugging; they connect in three stages–first pre-charge,
then ground, then power.
• The connector locations on the back of 2.5” devices are the same as for 3.5” devices,
allowing design of backplanes that can accommodate either size device.
The Serial ATA standard is a simplified packet switching network between a systemboard and
a disk drive (or any SATA device). It employs balanced voltage (differential) amplifiers and
four wires/two pairs (transmission line) to connect transmitters to receivers in a manner similar
to the 100BASE-TX Ethernet. The pins in the spec are labeled TX+, TX-, RX+, and RX- just
as they are in the twisted-pair Ethernet. There is no specification for a standard Serial ATA
cable (just electrical requirements it must meet), but each pair of wires is usually parallel and
shielded.
SATA devices use a separate power cable. They require power connectors with +12, +5, and
+3.3 volts. Current 4-pin ATX-type power supplies provide only driver power connectors with
the traditional +12 and +5 volt signals, so a PC may need a power supply fitted with SATA
drive power connectors or an adapter.
Serial ATA is supported in RAID implementations.
See the Serial ATA International Organization (SATA-IO) Web site at www.sata-io.org for
more information.

PC Architecture (TXW102) September 2008 29


Topic 5 - Storage Architecture

EIDE Cable (Left) vs Serial ATA Cable (Right) EIDE Cable (Left) vs Serial ATA Cable (Right)

Serial ATA
Interface
Connector

Serial ATA
Power
Connector

EIDE cables are bulky and SATA cabling is thinner and allows
get in the way. cleaner design.

PC Architecture (TXW102) September 2008 30


Topic 5 - Storage Architecture

Advanced Host Controller Interface (AHCI)


Advanced Host Controller Interface (AHCI) is a specification released by Intel in April 2004 to
provide a standard for Serial ATA controllers (called host controllers or host bus adapters in the
spec). Platforms supporting AHCI may take advantage of performance features such as no
master/slave designation for SATA devices—each device is treated as a master—and hardware-
assisted native command queuing. AHCI also provides usability enhancements such as hot-
plug. AHCI requires appropriate software support (e.g., an AHCI driver) and for some features,
hardware support in the SATA device or additional platform hardware.
AHCI provides a unified Serial ATA controller interface that is optimized for advanced new
features. It provides a single controller interface and enables operating system vendors and
device suppliers to design to a common interface and focus their efforts on bringing to market
such advanced features as native command queuing (NCQ).
AHCI features which a vendor can implement include the following:
• Native Command Queuing (NCQ) – NCQ supports issuing multiple commands to a disk and
allowing reordering of commands by the disk. For example, if S-F-T-A is issued, a non-NCQ
device would execute S-F-T-A while an NCQ device would queue and reorder as F-A-S-T.
With NCQ, Serial ATA devices can provide sophisticated command optimizations for multi-
threaded applications previously only available with high-end storage interfaces.
• Aggressive device/bus power management – The host can initiate requests for lower power
states when idle. The I/O Controller Hub, device, or software can initiate partial/slumber
requests.
• Staggered spin-up – The BIOS or OS can control when to spin-up a Serial ATA device.
AHCI support is included in the many Intel I/O Controller Hubs that have integrated RAID
support.
Information on the AHCI v.1.2 final specification can be found at
www.intel.com/technology/serialata/ahci.htm.

Busmaster EIDE Serial ATA AHCI


2 channels 32 ports
– Up to 4 devices – Up to 480 devices via port multipliers
Master/slave within channel All devices masters
Commands programmed through CPU I/O cycles Commands programmed through system memory

No hardware command queuing support Controller optimized queuing


Proprietary power management Native link/host power management
Proprietary hot plug Native hot plug
Proprietary error handling Native error handling

PC Architecture (TXW102) September 2008 31


Topic 5 - Storage Architecture

Lenovo 160 GB Serial ATA Hard Disk Drive (part number 09N4254)

Easy removal of Serial ATA Disk in Lenovo ThinkCentre M57


Small by pressing two blue buttons (without tools)

PC Architecture (TXW102) September 2008 32


Topic 5 - Storage Architecture

Four serial ATA connectors on systemboard of Lenovo ThinkCentre systemboard

Four serial ATA connectors on systemboard of Lenovo ThinkCentre systemboard

PC Architecture (TXW102) September 2008 33


Topic 5 - Storage Architecture

Serial ATA:
SATA300 or SATA 3.0 Gb/s

• Second-generation Serial ATA technology


• Doubles speed of Serial ATA from 150 MB/s to 300 MB/s (or 3.0 Gb/s)
• Same cables and connectors with original Serial ATA
• Supports new optional cable and connector variants
• Supports new higher power version for box-to-box application
• Supported in the Intel ICH7 and higher controllers
• Disks released in 2006

SATA 1.5 Gb/s SATA 3.0 Gb/s


Products in 2003 Products in 2006
150 MB/s (1.5 G/s) 300 MB/s (3.0 Gb/s)
Additional cable and
Serial, point-to-point cabling
connector options

© 2008 Lenovo

Serial ATA - SATA300


In 2004, the Serial ATA International Organization announced the SATA300 or SATA 3.0 Gb/s
specification which is the second generation Serial ATA technology. The specification includes
doubling the signaling speed for Serial ATA and introducing new cable and connector solutions to
support additional applications and usage models.
The specification for the second-generation Serial ATA signaling speed of 3.0 Gb/s was
introduced. The second-generation speed of 3.0 Gb/s (300 MB/s) [hence the name SATA300 or
SATA 3.0 Gb/s] is double that of the first-generation Serial ATA speed which was 1.5 Gb/s (150
MB/s).
Among the features of the enhanced technology is that no new cables and connectors are required
to support the higher signaling speeds.
In addition to doubling the speed for the internal physical layer (PHY) originally defined in the
SATA 1.0 specification, the new specification also defines a higher power version of it for longer
haul external datacenter use. The external PHY version defined in the specification only impacts
box-to-box applications (not used as a direct disk drive connection) and has been defined to match
the electrical parameters for the Serial Attached SCSI (SAS) PHY.
The SATA 3.0 Gb/s cable and connector specification adds several new cabling options:
• An internal multi-lane cable and connector assembly for streamlining connections between
multiple internal host ports and internal devices or short backplane
• An external consumer cable and connector solution that accommodates use of Serial ATA with
external storage devices

PC Architecture (TXW102) September 2008 34


Topic 5 - Storage Architecture

• External multi-lane datacenter cable and connector solution for connecting multiple Serial ATA
channels between chassis in a datacenter
The Intel ICH7 family was announced in May 2005 with full support for the SATA specification.
Later Intel I/O Controller Hubs continue to support both the original SATA 1.5 Gb/s speeds and the
new SATA 3.0 Gb/s speeds.
The Serial ATA International Organization (SATA-IO) is an independent, non-profit organization
developed by and for leading industry companies. Officially formed in July 2004 by incorporating
the previous Serial ATA Working Group, the SATA-IO provides the industry with guidance and
support for implementing the SATA specification.
For more on SATA technologies and the SATA-IO, visit www.sata-io.org.

PC Architecture (TXW102) September 2008 35


Topic 5 - Storage Architecture

Serial ATA:
External Serial ATA (eSATA)

• Support for external SATA devices via eSATA


• External SATA port provides external SATA device
support with standard or optional eSATA
cable/bracket
• Faster external disk interface (300 MB/s) than
FireWire (100 MB/s) and USB 2.0 (60 MB/s)
• Introduced in 2007 on select Lenovo desktops

Systemboard
External SATA storage

eSATA port on rear of


Lenovo ThinkCentre M57 Ultra Small

© 2008 Lenovo

External Serial ATA (eSATA)


Serial ATA devices may be supported external to a notebook or desktop via the external Serial
ATA (eSATA) specification. eSATA uses separate cables, connectors, and electrical requirements
from internal SATA. Internal and external SATA use different cable and connector construction.
Key eSATA features are the following:
• Same speed as SATA300 (300 MB/s)
• 2 meter maximum cable length
• Cable connector uses spring clips to provide retention
• External devices need independent power connector (an external power source). The Power Over
eSATA specification is expected in late 2008 to eliminate the need for a separate power
connector; a single cable will provide power and data transfer to a single external disk or optical
drive. The cable will enable support of most existing external SATA connectors and 300 MB/s
transfers.
• One device per channel
Lenovo supports a standard eSATA connector on select desktops such as the ThinkCentre M57 and
M57p Ultra Small and Small; the eSATA connector is optional on the ThinkCentre M57 and M57p
Desktop and Tower by plugging in an optional cable/bracket into a port on the systemboard that
only supports the eSATA option.

PC Architecture (TXW102) September 2008 36


Topic 5 - Storage Architecture

External SATA port cable (both Low Profile bracket Rear view shows bracket with eSATA in one slot
for Desktop and Full Height bracket on supported Lenovo ThinkCentre Tower
for Tower)

Full Height bracket shown


Plugs into unique SATA connector (not regular
SATA) on systemboard
External SATA device plugs into back of slot
bracket (PCI, PCI Express x1, or PCI Express x16)

PC Architecture (TXW102) September 2008 37


Topic 5 - Storage Architecture

Disk Performance:
Cache Position

• Disk speed is slow compared to memory

Hardware Disk Cache or Disk Buffer


FIFO Buffer

Disk Disk
Controller Cache

Software Disk Speeds in Comparable Terms


Cache Cache
CPU of 1 GHz L2 Cache Memory Disk
OS/Applications 1 ns 20 ns 60 ns 10 ms
Memory 1 sec 20 sec 60 sec 120 days

© 2008 Lenovo

Disk Cache Positions


A transfer from memory to disk (or disk to memory) could involve three totally independent
buffers or caches:
• Software disk cache (loaded at boot time of the operating system)
• Hardware disk cache (implemented on some disk controllers)
• Disk buffer (most disks have at least an 8 MB buffer)
These three caches are always used for reads (up to 90-percent hit rate) and can be used for writes
if the lazy write or write-back cache management is selected; it is usually the default.
All operating systems support disk cache. Some operating systems support parameters to customize
the size and features of the disk cache.
Most current IDE and Serial ATA controllers do not have hardware disk caches. However, most
have FIFO buffers (usually 128 bytes).
All disks have at least an 8 MB hardware buffer; newer disks typically have a 32 MB buffer. These
buffers are also known as read ahead buffers, because they always read a complete track of data,
whether or not it is immediately required. The data, which might be needed later, is in the buffer.
These buffers adhere to different architectures such as look-ahead buffer, segmented look-ahead
buffer, and adaptive buffering.

PC Architecture (TXW102) September 2008 38


Topic 5 - Storage Architecture

If the speed of a processor, L2 cache, memory, and disk were equivalent in units related to each
other, a 1 ns processor cycle (a single cycle for 1 GHz) would be equivalent to 120 days for a single
access (10 ms) of a disk, because there is a difference of a million between a nanosecond (ns) and
millisecond (ms). A nanosecond is a billionth of a second, and a millisecond is a thousandth of a
second.

PC Architecture (TXW102) September 2008 39


Topic 5 - Storage Architecture

Disk Performance:
Cache

• EIDE and Serial ATA disks do not differ much in performance


• Caching affects performance the most
• Software caches are faster than hardware
(except in high-end servers), because the memory
bus is faster than the I/O bus
• Disk caching increases performance
- Four times faster than raw disk performance
- 35% application performance from none to 2 MB
- 6% application performance from 2 MB to 8 MB

© 2008 Lenovo

Disk Performance - Cache


Disk cache with write-back caching that caches both reads and writes is the best implementation
(versus read-through caching, which caches only reads). With write-back caching disabled, the
application performance is reduced by 10 percent.
For servers, write-back caching is usually faster for light loads. So if server is properly configured,
select write-back. Write-through is usually faster for heavy loads since I/O operations do not have
to wait for available cache memory. If you can not configure an optimal disk configuration, select
write-through.
With the exception of video, image, or audio streaming, it is unlikely that a system will ever use the
full speed of an IDE or Serial ATA link due to the mechanical motion (seek time and latency) of
the disk(s).

PC Architecture (TXW102) September 2008 40


Topic 5 - Storage Architecture

Disk Performance:
Factors in Disk Performance

• Seek time and latency are the key bottlenecks


• Need large quantity of disks for good server performance

Media
Disk request Command Seek transfer
to controller decode time Latency rate

or
.002 ms 1 ms 12 ms 6.6 ms .073 ms

If already in buffer Higher RPM


improves latency
Into buffer EIDE 4500 rpm
(128 KB) controller IDE controller
16-Bit 16-Bit 32-Bit
32 Bits

Starts reading from .023 ms


buffer immediately Puts two 16-bit words into a 32-bit packet
if needs to transfer across PCI bus
.046 ms

© 2008 Lenovo

Factors in Disk Performance


The biggest bottleneck in disk operations is the seek and latency times of the disk, i.e., the
mechanical part of disk operations.
Media transfer rate is typically from 32 to 80 Mb/s, which translates to 4 to 10 MB/s. If the math is
done to transfer 512 bytes (one sector), it translates to .073 ms.
There is actually EIDE controller circuitry both on the disk and the system chipset (like the I/O
Controller Hub).
The meanings of fast disk subsystem depend on whether one is referring to a single-user desktop
system or a multi-user server system. On a PC, latency is the most important measure of the degree
of the fastness of a disk, but on a server, throughput is important.
For high server performance, the most important item is to have a large quantity of disks. This
allows many simultaneous accessing disks with overlapped I/O.

PC Architecture (TXW102) September 2008 41


Topic 5 - Storage Architecture

Disk Performance:
Many Factors of Disk Throughput

Width of Bus
Type of Controller Size of Disk Buffer
Device Driver Type of Buffer
Type of Bus
Width of Bus
Data Transfer Rate

Amount of I/O Amount of Cache/FIFO


Memory Cache Controller
Random or Sequential
Read/Write
Average Seek Time
Memory L2 Cache Average Latency
RPM
Type of Head
Speed of Local of Bus No-ID Sector Formatting
Processor OS File System
Data per Track
Speed of Processor
Track-to-Track Seek Time

© 2008 Lenovo

Many Factors of Disk Throughput


Note that hard disk capacity alone can enhance the performance of a PC. A half-filled
2 GB hard disk can find, read, and write data faster than a full 1 GB disk, because the head does not need
to move as far. Large-capacity hard disks frequently have faster rotation rates, which boost performance.
Speed, measured in revolutions per minute, affects two aspects of disk performance:
• The drive with a higher speed will have a proportionally higher data transfer rate, given two disks with
similar recording densities.
• The drive with the higher speed will have faster access to data by the differences in latency. (Latency is
inversely proportional to speed.)
High-speed disks (such as 10,000 rpm and higher) generate much heat and noise.

PC Architecture (TXW102) September 2008 42


Topic 5 - Storage Architecture

It is important to address specific measures of performance. Transaction processing often consists


of random accesses to blocks of data that are as short as 2 KB. However, some applications need to
access large sequential data streams. In the two extremes, performance for the shortest transfer is
measured by access time. Latency, affected directly by speed, is about a third of access; therefore, a
30-percent improvement in speed should yield about a 10-percent improvement in access time for
short random data transfers. For the longest data transfers, an improvement in speed will usually
directly improve data rate by the same proportion.
Typical performance characteristics are listed in the table below.
% Drop in
Type % Drop in
RPM Latency Access Sequential
Seek Access
Throughput
7200 8.0 4.2 12.2 0% 0%
6300 8.5 4.8 13.3 9% 12%
5400 9.0 5.6 14.6 18% 25%
4500 11.0 6.7 17.6 44% 37%

The throughput figures assume constant numbers of sectors per track. (This assumption tends to be
a somewhat unrealistic expectation.)
Drive geometry, data density, speed of electronics, and innovative technique in the drive controller
and firmware can significantly influence performance. More data per track (more sectors per track)
yields better performance. Track-to-track seek time affects performance (0.9 ms track-to-track seek
time outperforms 2.2 ms).

PC Architecture (TXW102) September 2008 43


Topic 5 - Storage Architecture

Summary:
Storage Architecture

• The disk subsystem consists of a disk


controller and hard disk drive.
• Enhanced IDE was the common interface
used by disks and optical drives prior to
2006.
• Serial ATA is a serial, point-to-point
dedicated link with SATA150 providing
1.5 Gb/s bandwidth; SATA300 doubles
the bandwidth to 3.0 Gb/s.
• Disk performance is influenced by many
factors, especially cache.

Serial ATA Disk


for desktop

© 2008 Lenovo

PC Architecture (TXW102) September 2008 44


Topic 5 - Storage Architecture

Review Quiz

Objective 1

1. How is average access time computed?


a. Sustained transfer rate + rotational speed
b. Average seek time rate + average latency
c. Media transfer rate + average latency
d. Rotational speed + average latency

2. Many Lenovo ThinkPad notebooks utilize a ThinkVantage Technology that has a motion sensor
and software utility system that protect the hard disk drive from damage due to a fall or rough
handling. What is the name of this ThinkVantage Technology?
a. Access Connections
b. Active Protection System
c. Embedded Security Subsystem
d. ThinkVantage System Update

3. What disk technology provides a new recording method that aligns bits at 90 degrees to the
surface so that disk capacities are increased significantly?
a. Femto slider
b. Hard Disk Drive Shock Absorber
c. Perpendicular magnetic recording
d. Active Protection System

4. What type of technology is used as a disk-cache accelerator?


a. Flash memory
b. DVI-I connector
c. Active Protection System
d. Encoded servo

5. What type of disk uses flash memory instead of a rotating platter?


a. Solid State Drives
b. Intel Turbo Memory
c. Full Disk Encryption
d. Active Protection System

PC Architecture (TXW102) September 2008 45


Topic 5 - Storage Architecture

Objective 2

6. What type of devices used Enhanced IDE?


a. Diskette drives and memory
b. Disks and optical drives
c. Adapter cards and graphics controllers
d. Graphics controllers and processors

Objective 3

7. Desktops such as Lenovo ThinkCentre desktops primarily use what disk interface?
a. SCSI
b. Fibre Channel
c. Enhanced IDE
d. Serial ATA

8. Which is the second-generation Serial ATA technology and what is the transfer speed?
a. PCI Express, 200 MB/s
b. USB 2.0, 300 MB/s
c. SATA 3.0 Gb/s, 150 MB/s
d. SATA 3.0 Gb/s, 300 MB/s

Objective 4

9. What affects disk performance the most?


a. Type of disk (Enhanced IDE or SCSI)
b. Caching
c. Capacity of disk
d. Latency

10. What difference in disk performance does software and/or hardware disk cache on EIDE and
SCSI disks make?
a. Reduction of disk performance because of overhead caused by looking through cache
b. Little difference in performance
c. 10 percent increase in disk performance
d. Two to four times faster than if no cache is available

11. What is the biggest bottleneck in disk operation?


a. Latency
b. Media transfer rate
c. Command decode
d. Buffer read

PC Architecture (TXW102) September 2008 46


Topic 5 - Storage Architecture

Answer Key
1. B
2. B
3. C
4. A
5. A
6. B
7. D
8. D
9. B
10. D
11. A

PC Architecture (TXW102) September 2008 47


Topic 5 - Storage Architecture

PC Architecture (TXW102) September 2008 48


Topic 6 - Graphics Architecture

PC Architecture (TXW102)
Topic 6:
Graphics Architecture

© 2008 Lenovo

PC Architecture (TXW102) September 2008 1


Topic 6 - Graphics Architecture

Objectives:
Graphics Architecture

Upon completion of this topic, you will be able to:

1. Identify important graphics subsystem features and functions, including the


type and amount of graphics controller memory
2. List a benefit of the Intel Graphics Media Accelerator integrated graphics
controller
3. Identify the role of PCI Express x16 graphics implementation
4. Define the features of CRT analog monitors and TFT flat panel monitors
5. Recognize implementations of analog and digital monitor interfaces
6. Describe the latest monitor connectors

© 2008 Lenovo

PC Architecture (TXW102) September 2008 2


Topic 6 - Graphics Architecture

Graphics Subsystem Features:


Overview

PC with Converts Contains color Converts Image displayed on


application graphics calls information for from monitor with up to
windows and from processor each pixel digital to 16 million colors
icons stored in into proper analog 85 times a second
main memory format signal
(except
on digital
monitor)
Browser,
Acrobat or
other Graphics Graphics DAC
applications controller memory
in memory

Different kinds of monitors: Monitors have different features:


- Cathode ray tube (CRT) - Addressability
- Flat panel - Video memory
- Color depth

© 2008 Lenovo

Graphics Overview
This topic will cover the many parts that make up the graphics subsystem. The diagram shows a
high-level overview of what is covered in this topic.
The main memory of a PC contains the applications and associated code for the appropriate
images. To view the interface of applications, the appropriate calls must be transferred from
memory to the graphics controller. The graphics controller utilizes its own memory or the main
system memory to create the exact color of each pixel on the monitor in a digital binary format.
The graphics controller also drives the digital-to-analog converter (DAC), so the signal is converted
to an analog signal that a CRT monitor can understand. Some monitors accept digital signals, so the
graphics controller does not have a DAC.
Many characteristics of monitors vary from product to product.

PC Architecture (TXW102) September 2008 3


Topic 6 - Graphics Architecture

Graphics Subsystem Features:


Graphics Processing

• There are three elements to graphics


processing:
- Graphics controller
- Monitor
Graphics Controller
- Video device driver
• All three elements must support the specific
resolution and refresh rate desired.
• The terms video controller and graphics
controller are often used interchangeably.
• Video more accurately reflects full motion
video.
Monitor

Device Driver

© 2008 Lenovo

Graphics Processing
There are three elements to graphics processing: the graphics controller, the monitor, and the video
device driver. All three of these elements must support the specific resolution and refresh rate
desired. Examples of the limitations inherent in these graphics processing elements include:
• Graphics controller limitation – Say that a given monitor supports a resolution of 1280x1024, but
it has an older graphics controller that supports only a maximum resolution of 1024x768. The
monitor can only support the lower resolution of the graphics controller. Some graphics
controllers may only have the newer DVI connector in one of its forms which could limit its
attachment to older monitors.
• Monitor limitation – Say that an older monitor supports a resolution of only 1024x768 at 72 Hz,
but it has a newer graphics controller is capable of supporting a maximum resolution of
1600x1200 at 85 Hz. The monitor can only support the lower resolution in this case as well.
Some monitors may only accept a digital signal instead of the traditional analog signal, in which
case it will be physically impossible to connect them. Monitors are migrating to the newer DVI
connectors from the 15-pin D-sub connector.
• Device driver limitations – It is also possible for a monitor’s resolution to be limited by its device
driver. Newer device drivers can be installed to support greater resolution, more simultaneous
colors, and better performance.
The terms video controller and graphics controller are often used interchangeably, but video more
accurately reflects full motion video. Video actually means television in the form of NTSC or PAL
video.

PC Architecture (TXW102) September 2008 4


Topic 6 - Graphics Architecture

Graphics Subsystem Features:


Resolution

• A picture element (pixel or pel) is


the smallest individually addressable
part of an image. 768

1024
• Resolution refers to how many
pixels can be addressed on the
display horizontally and vertically.

• 1024×768 means that the display


is divided into 1024 horizontal units
by 768 vertical units.
4:3 aspect ratio

• PCs originally used 4:3 aspect ratio;


widescreen systems use 16:10
aspect ratio.

16:10 aspect ratio (wider view)


© 2008 Lenovo

Addressability
The terms resolution and addressability are often used interchangeably. The correct technical
definitions are as follows:
• Addressability is the number of pixels that can be addressed on the display horizontally and
vertically.
• Resolution is the clarity of the final image. It is a function of addressability and memory quantity.
However, the industry uses the term resolution to mean addressability, so the term addressability is
rarely used.
All monitors have the same minimum resolution of VGA (640 by 480). Monitors support different
resolutions. Better quality premium units have a clearer, sharper picture than cheap, entry-level
units.
Video information always goes to the display on a pixel-by-pixel basis as the raster lines are
scanned. The distinctions between text mode and graphics mode are in the system and the software,
not in the display.

PC Architecture (TXW102) September 2008 5


Topic 6 - Graphics Architecture

Aspect Ratio
The ratio of width to height of an object. It states the relationship of one side to the other. For
example, the aspect ratio of the screen of a standard computer monitor is 4:3, which is a rectangle
that is somewhat square. The "4" means four units wide, and the "3" means three units high.
Another way of expressing this ratio is 1.33:1.

5:4 aspect ratio 16:10 aspect ratio


(such as 19” standard monitor) (such as 19” wide monitor)

Viewing Angle
Viewing angle is the maximum angle at which a display can be viewed with acceptable definition.
The viewing angle is measured from one direction to the opposite, giving a maximum of 180° for a
flat, one-sided screen.

Example of a 130 degree viewing angle; Example of a 178 degree viewing angle;
beyond 65 degrees horizontal or vertical is 89 degrees horizontal or vertical is readable
blurry

PC Architecture (TXW102) September 2008 6


Topic 6 - Graphics Architecture

Graphics Subsystem Features:


Common Resolutions
Aspect Wide Number of
Type Type Description Resolution
Ratio Format Pixels
VGA VGA 4:3 No 640x480 307,200
SVGA Super VGA 4:3 No 800x600 480,000
XGA Extended Graphics Array 4:3 No 1024x768 786,432
WXGA Wide XGA 16:10 Yes 1280x800 1,024,000
QVGA Wide XGA 4:3 No 1280x960 1,228,800
SXGA Super XGA 5:4 No 1280x1024 1,310,720
SXGA+ Super XGA+ 4:3 No 1400x1050 1,470,000
WXGA+ Wide XGA 16:10 Yes 1440x900 1,296,000
WSXGA Super Wide XGA 16:9 Yes 1600x900 1,440,000
WSXGA Super Wide XGA 16:10 Yes 1600x1024 1,638,400
WSXGA+ Wide SXGA+ 16:10 Yes 1680x1050 1,764,000
UXGA Ultra XGA 4:3 No 1600x1200 1,920,000
WUXGA Wide Ultra XGA 16:9 Yes 1920x1200 2,304,000
QXGA Quantum Extended Graphics Array 4:3 Yes 2048x1536 3,145,728
WQXGA Wide Quad XGA 16:10 Yes 2560x1600 4,096,000
QSXGA Quantum Super XGA 5:4 Yes 2560x2048 5,242,880
QSXGA+ Quantum Super XGA+ 4:3 Yes 2800x2100 5,880,000
QUXGA 4:3 Yes 3200x2400 7,680,000
© 2008 Lenovo

Common Resolutions
This table shows commonly used resolutions of monitors, also known as graphics mode. Graphics
mode is when the software generates the image on a pixel-by-pixel (not character-by-character) basis.
Less common resolutions are the following:
• 320x240 QVGA (Quarter VGA); for cell phones
• 3840x2400 QUXGA-W (Quarter-UXGA Wide)
• 1280x768 or 1366x768 or 1440x990 WXGA (1280x1024 is 16:10 aspect ratio)
• 1600x900 WSXGA (1600x1024 is 16:10 aspect ratio)

High Definition (HD)


• HD ready – Resolutions advertised as "HD ready" mean that the most generic resolution of
1280x800 meets the 720p criteria
• Full HD or Full High Definition – High definition at exactly
1920x1080 resolution with progressive scan (called 1080p).
A display that is 1080i (interlaced) is not Full HD.

PC Architecture (TXW102) September 2008 7


Topic 6 - Graphics Architecture

• 16:9 and 16:10 – The most accurate representation of video content for Blu-ray, games, and
digital TV is 16:9 (not 16:10). 16:9 provides a narrower image than 16:10 which makes the image
taller.

Text Mode
Software uses two visual modes: text mode and graphics mode.
• Text mode
– PC software generates the image of the display on a
character-by-character basis rather than a pixel-by-pixel
basis.
– This mode is used only for character-based applications (DOS, full
screen DOS applications, diagnostic programs). The screen
is divided into 25 rows by 80 columns, so each character
occupies a 9-by-16 pixel.
– Common addressability of text mode:
• 720x350 (25 rows by 80 columns)
• 720x400 (25 rows by 80 columns)
• 720x400 (50 rows by 80 columns)

PC Architecture (TXW102) September 2008 8


Topic 6 - Graphics Architecture

Modes and Colors with Memory


The following table lists the required memory for specific color depths. This is not an issue today
because PCs all have over 8 MB of graphics memory.

Resolution 16 Color 256 Color 65,536 Color 16.7 Million Color


(4-bit) (8-bit) (High Color (True Color
or 16-bit) or 24-bit)

640×480 0.5 MB 0.5 MB 1 MB 1 MB

800×600 0.5 MB 0.5 MB 1 MB 1.5 MB

1024×768 0.5 MB 1 MB 1.5 MB 2.5 MB

1280×1024 1 MB 1.5 MB 2.5 MB 4 MB

1600×1200 1 MB 2 MB 4 MB 6 MB

1800×1440 2 MB 4 MB 8 MB 8 MB

For 1 MB of video memory, the best configurations are:


• 640 by 480 = 307,200 pixels by 24 bits per pixel (3 bytes) = 921,600 bytes = 921 KB, which
yields 16.7 million colors
• 800 by 600 = 480,000 pixels by 16 bits per pixel (2 bytes) = 960,000 bytes = 960 KB, which
yields 65,536 colors
• 1024 by 768 = 736,432 pixels by 8 bits per pixel (1 byte) = 736,436 bytes = 736 KB, which
yields 256 colors
• 1280 by 1024 = 1,310,720 pixels by 4 bits per pixel (.5 byte) = 655,360 bytes = 655 KB, which
yields 16 colors

640×480 = 307,250 pixels (307 KB)


800×600 = 480,000 pixels (480 KB)
At 256 colors: 1024×768 = 736,432 pixels (736 KB)
1280×1024 = 1,310,720 pixels (1.3 MB)

Text mode (720 by 350 and 720 by 400) requires only 4 KB per frame.

PC Architecture (TXW102) September 2008 9


Topic 6 - Graphics Architecture

Memory and Color Depth


A key point to understand is that the quantity of video memory determines the number of
simultaneous colors for a particular addressability. More memory supports more colors. Color
depth is how many colors are displayed on the screen, and is a function of video memory.
A photographic-like image requires many simultaneous colors, not higher addressability.
Television produces 16.7 million simultaneous colors. NTSC has 525 lines at 60 Hz, while PAL
has 625 lines at 50 Hz. VGA requires four bits (half a byte) of video memory per pixel to display
16 colors.

Color Table

16 colors 4 bits per pixel

256 colors 8 bits per pixel

65,636 colors 16 bits per pixel

16.7 million colors 24 bits per pixel

PC Architecture (TXW102) September 2008 10


Topic 6 - Graphics Architecture

Graphics Subsystem Features:


Graphics Controller

• The graphics controller functions as


Bus
an accelerator
- Performs a specific image calculation Graphics
to offload the processor controller

- Less data is passed over the bus


(versus older, dumb-frame buffer)
• Includes a DAC (digital-to-analog converter) Graphics
memory
to convert digital coding in the system to analog
(the signal understood by monitor)
• Also called Graphics Processing Unit (GPU)
- A processor that acts as a coprocessor to DAC
perform tasks previously done by main processor
Analog signal

© 2008 Lenovo

Graphics Controller
The functions of a graphics controller (sometimes known as a graphics processing unit [GPU]) are
summarized as follows:

1. Accepts mode control commands from processor


2. Produces timing pulses for vertical and horizontal scans
3. Scans video memory for pixel information
4. Decodes pixel information into three colors and intensity
5. Drives the three DACs to produce the three color signals (unless it is a digital flat panel monitor)

The GPU processes commands from the main processor, converting graphics calls into a data
stream to be written into graphics memory (the frame buffer) prior to being sent out to the monitor.
In the frame buffer, the screen image is laid out and stored according to the horizontal and vertical
grid, depicting the 2D resolution on the screen. Then, the DAC (or RAMDAC) converts the digital
pixels to the RGB (red, green, blue) analog signal needed by the CRT monitor.
The speed of the DAC determines the maximum refresh rate possible at a given screen
addressability. For example, a 250 MHz DAC can drive an 80 Hz maximum refresh rate at 1920 by
1080 addressability, while a 220 MHz DAC would drive the same addressability
at 75 Hz.

PC Architecture (TXW102) September 2008 11


Topic 6 - Graphics Architecture

Graphics controllers have had increased performance over time because they have migrated from
32-bit to 64-bit to 128-bit to 256-bit graphics accelerators. In other words, the internal processing of
the commands and data is done with a 32, 64, 128, or 256-bit wide engine. Generally, the wider the
engine, the better the performance. For the graphics engine to utilize its 128- (or 256-) bit capability
fully, it needs to obtain data from memory at the same bit width. Increasing the width of this data
path increases both the cost and complexity of the processor, as well as the amount of memory that
is required.
A 128-bit graphics controller needs a 128-bit data path, while a 256-bit graphics controller needs a
256-bit data path; otherwise, multiple transfers are needed. For example, a 128-bit controller with a
32-bit memory path would need four transfers before it received its 128 bits of data. The type of
memory architecture (SDRAM, DDR2) and packaging (DIMM, embedded) play a role in the
transfer, and, therefore, the performance.
The graphics processing unit (GPU) acts as a second processor to a system because it plays an
extremely important role in system performance. The GPU has become a coprocessor to the main
processor by taking on more tasks and relieving some of the tasks from the main processor. GPUs
provide tremendous floating-point performance and are highly programmable and parallelizable;
parallelizable means it can break up tasks into smaller pieces and run each piece simultaneously.
GPUs handle 3D acceleration, physics calculations, and video transcoding (converting from one
video format to another). Video transcoding on a GPU can process from four to eight times faster
than a high-end, dual-core processor.

PC Architecture (TXW102) September 2008 12


Topic 6 - Graphics Architecture

Graphics Subsystem Features:


Graphics Controller Implementation

• Typical implementation
- Notebook
ƒ Integrated (in Graphics Memory Controller Hub)
ƒ Discrete (chip using PCI Express x16)
ƒ Switchable (both integrated and discrete)
- Desktop
ƒ Integrated (in Graphics Memory Controller Hub) Integrated Graphics for
Desktop (Intel G45 Graphics
ƒ Discrete (PCI Express x16 adapter) Memory Controller Hub)
• Integrated graphics uses main system memory
- Low cost and widely used
• Discrete graphics uses its own memory
- Higher cost with best performance

Discrete Graphics for Desktops


PCI Express x16 Adapter
(ATI Radeon HD 2600 XT)
© 2008 Lenovo

Graphics Controller Implementation


Graphics controllers are implemented as integrated, discrete, or switchable. Integrated means the
graphics controller is inside the Graphics Memory Controller Hub (GMCH) chipset; this provides
good performance at the lowest cost. Discrete means that the controller and its memory are external
to the GMCH. Discrete graphics controllers have better performance because of the dedicated
circuitry for this function, but are higher cost, generate more heat, and require more power and
physical space. Switchable graphics means the system has both an integrated graphics controller
and a discrete graphics controller; the system switches between each depending on the power and
performance requirements.
For the lower cost, mainstream segments, the integrated graphics that is part of the GMCH offer
good performance without the additional cost, heat, power, or space. This solution uses the main
system memory (memory used for the OS) and is often known by the term Dynamic Video
Memory Technology (DVMT), which is covered later in this topic. DVMT is also sometimes
called Unified Memory Architecture or Shared Memory Architecture.

PC Architecture (TXW102) September 2008 13


Topic 6 - Graphics Architecture

Notebooks utilize either integrated graphics or discrete graphics. Notebooks using the Intel 8xx
chipsets with discrete graphics typically use the AGP 4X bus with graphics controllers from
typically either NVIDIA or ATI. Notebooks using the Intel 9xx chipsets with discrete graphics use
the PCI Express x16 bus with graphic controllers from NVIDIA or ATI.
Desktops utilize either integrated graphics or discrete graphics also. Desktops using the Intel 8xx
chipsets with discrete graphics typically use the AGP 8X bus with graphics controllers from either
NVIDIA or ATI. Desktops using the Intel 9xx or later chipsets with discrete graphics typically use
the PCI Express x16 bus with graphics controllers typically from NVIDIA or ATI.

Discrete graphics controller using PCI Express x16


bus on a Lenovo ThinkPad notebook

PC Architecture (TXW102) September 2008 14


Topic 6 - Graphics Architecture

Graphics Subsystem Features:


Switchable Graphics

• Hardware/software feature to switch dynamically between integrated


graphics and discrete graphics
• Dynamic (such as a notebook switching from AC power to battery)
or user initiated
• No reboot required
• Notebooks have better battery life with integrated graphics, but
better graphics performance with discrete graphics
• AMD calls this hybrid graphics
• First used with some Mobile Intel
GM45 Express Chipset notebooks
• Used in select Lenovo notebooks
such as ThinkPad R400, T400,
T500, and W500

Screen before switching graphics controllers


© 2008 Lenovo

Switchable Graphics
Switchable graphics is a hardware/software feature that allows the user to maximize battery life or
maximize performance by switching between integrated graphics or discrete graphics based on
AC/DC state or user initiation.
The switch is made dynamically (such as switching from AC power to a battery on a notebook) or
manually (via a hot-key or software dialog). This feature only works on Windows Vista.
Exceptions:
• Running an OpenGL application
• An application in exclusive mode
• System is connected to a DisplayPort or DVI monitor
Switch time:
• Switch time from discrete graphics to integrated graphics: 2 seconds
• Switch time from integrated graphics to discrete graphics: 5 seconds
No reboot is required to make the switch. A unified driver is used in this environment as there are
not different drivers for each controller.

PC Architecture (TXW102) September 2008 15


Topic 6 - Graphics Architecture

Screen for setting Switchable Graphics

PC Architecture (TXW102) September 2008 16


Topic 6 - Graphics Architecture

Graphics Subsystem Features:


Graphics Controllers in ThinkPad and ThinkCentre Systems

Product Vendor Chip Type Std/Max Memory

ThinkPad X61 Intel GMA X3100 in GM965 Integrated Uses main memory

ThinkPad T61p NVIDIA NVIDIA Quadro FX 570M PCI Express x16 256 MB

ThinkCentre A61 ATI ATI Radeon X1250 Integrated Uses main memory

ThinkCentre M57 Intel GMA 3100 in Intel Q35 Integrated Uses main memory

• Desktop and mobile systems


use PCI Express x16 for
graphics controllers

ATI Radeon HD 2400 XT


PCI Express x16 Adapter
supported in select ThinkCentre desktops
© 2008 Lenovo

Graphics Controllers in ThinkPad and ThinkCentre Systems


ThinkPad notebooks and ThinkCentre desktops continue to have the latest technology for their
graphics controllers.
ThinkPad notebook systems and ThinkCentre desktop systems utilize either value-based graphics,
such as the integrated graphics in the Graphics Memory Controller Hub (GMCH), or high-
performance discrete graphics, depending on the marketing segment of the product.
Servers do not need the fastest graphics controllers, because these systems do not need high
performance for this subsystem.

VGA

DVI-D

Back of Lenovo ThinkCentre M57 Eco USFF


with both VGA and DVI-D connectors

PC Architecture (TXW102) September 2008 17


Topic 6 - Graphics Architecture

Graphics Subsystem Features:


Video Device Drivers

• Device drivers determine the following:


- Addressability
- Number of colors
- Performance
• Utilities and/or DDC to set refresh rate
- DAC determines maximum
refresh rate
• Each requires a different configuration
- 800×600×65,536 colors at 72 Hz
- 1024×768×256 colors at 85 Hz
- 1280×1024×16.7 million colors at 85 Hz
• Under Windows:
- Configures addressability and color
depth in Control Panel\Display Properties\Settings
- Vendor utility usually determines the refresh rate

© 2008 Lenovo

Video Device Drivers


All software supports VGA (640x480), but device drivers determine addressability, number of
colors, and performance differences. Most operating systems have multiple video device drivers
included when the operating system is installed. If a new graphics adapter is added, device drivers
are included on a diskette or CD-ROM and are installed when the adapter is installed. Device
drivers affect graphics performance. For example, installing newer drivers from the graphics
vendor can yield up to a 50% performance increase in that subsystem.
Device drivers can have many different configurations; a few are shown below. Each of the
following requires a different driver and/or configuration:
• 800×600×65,536 colors at 72 Hz
• 1024×768×256 colors at 85 Hz
• 1280×1024×16.7 million colors at 85 Hz
Under Windows operating systems, the user can change the addressability and color depth by
selecting the Settings tab in Control Panel\Settings\Display Properties. To set refresh rates,
there is usually a separate tab in Display Properties (created when the specific device driver of the
graphics controller is installed). The highest refresh rate supported by a monitor can be
automatically set if the monitor and graphics controller support the VESA display data channel
(DDC) protocol.

PC Architecture (TXW102) September 2008 18


Topic 6 - Graphics Architecture

DOS had only a 640×480 by 16-color driver; each DOS application needs its own driver. However,
under Windows, one driver supports all Windows applications. (i.e., a separate driver is not needed
for each application.)
Color depth and addressability can be adjusted in the Display Properties window in Windows OSes,
as shown below:

Display Properties Window in a Windows OS

PC Architecture (TXW102) September 2008 19


Topic 6 - Graphics Architecture

Graphics Subsystem Features:


3D Graphics

• 3D graphics display images from all sides,


providing an illusion of depth with motion.
• 3D APIs allow 3D applications to run across
compatible chips.
- Microsoft Direct3D has become the industry standard.
- Silicon Graphics OpenGL is used more in the high-end
market.
• Direct3D is now supported on all mainstream
graphics controllers.
• If the application is written to a 3D API, it can use
a hardware-assisted engine (otherwise, software
acceleration is slower).
• 3DMark03 and 3DMark05 are common 3D benchmarks.

© 2008 Lenovo

2D versus 3D Colors
The difference between 2D and 3D colors is that some memory is used up for a Z buffer in 3D.
The maximum number of colors is determined by the size resolution and how much memory the
graphics card has. For 2D, there is no Z buffer. When 3D is enabled, Z information must be
kept for each pixel, and 3D reduces the number of colors that can be seen in each resolution.
The calculation that determines how much memory is needed is as follows:
memory = [(resolution x * resolution y) * (bytes of color depth + Z buffer depth)].
So, for 1024 by 768 at 16 million colors and 16-bit Z buffer: memory=[(1024*768) + (3 bytes +
2 bytes)] = 3.9 MB.
Another aspect that may be seen on high-end 3D cards is double buffering. If double buffering
is enabled, there are two frame buffers in memory, i.e., the bytes of color depth term in the
equation above will double and cutting down on the maximum colors available in a given
resolution. Double buffering increases the number of frames drawn per second.
Running 3D applications requires more memory than 2D applications. When a user starts a 3D
application (usually games), the addressability and color depth reduces automatically without
giving an error message. Most 3D applications support 640 by 480 with 256 colors to maintain
compatibility with 3D controllers with a minimum of 2 MB. When the application exits, the
original settings are restored.

PC Architecture (TXW102) September 2008 20


Topic 6 - Graphics Architecture

When an application utilizing 3D is loaded, the software queries the system to see if the appropriate
level of 3D support is present. If the proper support is present, the application utilizes hardware-
assisted acceleration; otherwise, slower software acceleration is used (the processor itself must
recalculate each 3D object). The application cannot mix hardware acceleration for some functions
and software emulation for others; it is all one way or the other.

3D Graphics
The term 3D graphics implies displaying an image from all sides, providing an illusion of depth
with motion. With the emergence of Microsoft Direct3D API and many new 3D-capable, low-cost
chipsets, developers of games, business software, and VRML Web browser plug-ins are using 3D
graphics with increased frequency. 3D graphics bring advanced workstation graphics to the PC,
speeding both two-dimensional and three-dimensional graphics. The result is more realistic
graphical representations; pictures are more detailed, and animation is more fluid. Using 3D
graphics, a software program may allow you to walk through a room or race a car. The faster the
PC processes the graphics, the more smoothly and realistically things appear as you view the
changes.
3D acceleration is necessary for the computation-intensive graphics found in today's games. There
are few business applications that use 3D (even the 3D chart function in Microsoft Excel is handled
by software). Business applications run smoothly with 2D support, e.g., waiting for a screen to
refresh or Microsoft Word to scroll down a page is 2D.
3D combines geometry engines, texturing, and shading with motion. Graphics controllers may have
3D acceleration, which is not the same as a 3D hardware engine. A 3D hardware engine is what
gives high-performance 3D. 2D graphics refers to tasks such as scrolling, moving windows, and
opening dialog boxes (including 3D bar charts and DVD decoding). CAD and solid modeling
programs are not really 3D in that their goal is to produce a single instance of an object, although
the underlying geometry, surface textures, and colors of that object may be included in an
animation program.
Some of the more important features of 3D chips include Z buffering (defined in the definitions
section), lighting, texture mapping, and rendering. These basic tasks can be further enhanced with
various texture-mapping options and special rendering features. 3D chips vary as to what functions
they support.
Lighting is an important part of the 3D process because you have to tell the computer how an object
or scene is going to be lit – there is no automatic or inherent lighting. And it would be difficult if
not impossible to make an object look three dimensional without it. The shadows, color shading,
reflections and other effects that result from the virtual light source are essential to making it look
realistic.
Texture mapping involves triangles. In order to make pictures on a PC seem to have three
dimensions (height, width, and depth), the screen is divided into triangles. The triangles may be of
any size, and the sides may be of the same lengths or different lengths. The more triangles, the more
detailed the scene can be. A big, flat wall, for example, may be drawn as two huge triangles, while
a picture on that wall may consist of many tiny triangles. Texture mapping is the way the creator of
the scene paints each triangle with a pattern or texture.
Rendering is the process by which an object, after it has been designed and defined geometrically,
is actually represented on the monitor. In other words, the 3D representation

PC Architecture (TXW102) September 2008 21


Topic 6 - Graphics Architecture

of the object or scene must be turned into a 2D set of pixels that are “painted” onto a display screen.
The rendering process takes into account all the (virtual) lights that affect the particular image being
rendered. Three common rendering processes are ray tracing, radiosity, and scanline rendering.
The creation of a 3D image starts with a wireframe skeleton, which comprises polygons and is
stored as a complex mathematical model. Because these models are three-dimensional, an object
can be rotated to any point of view and can be manipulated in many ways. To make the wireframe
appear solid, it is then dressed with color, texture, and light. Each of these stages requires additional
processing power, because each time the model is changed, the calculations need to be redone.
In late 1999, mainstream 3D graphics chips began to integrate transform and lighting (T&L)
calculations from the main processor to the graphics chip. Commonly called T&L acceleration, this
acceleration increases the performance of the 3D images.
There are several components to the 3D pipeline. (The pipeline is what assembles all the
information and transfers it to the monitor.) These components include: transforms, lighting, setup,
and rendering. Transforms consist of the mathematical calculations used to determine translation
(movement), rotation, and other changes in objects. Lighting calculations determine how a scene
and its various objects are illuminated. (When we speak of lighting in this context, we mean
geometrically calculated light sources, not light maps, which are special texture maps that simulate
lighting effects.) Texture coordinates are then assigned, and the objects are deconstructed into
triangles and vertex data (vertices are the corners of polygons – typically triangles) and sent to the
setup engine. Setup is the process by which vertex data generated by the transform and lighting
steps is translated into data formats suitable for pixel generation. The final step is rendering, in
which pixels are actually created, properly shaded, and colored for display in the frame buffer. It's
at this point that the actual texels (texture elements) from the various textures are blended into the
colors of the base object's pixels to create each pixel's final color.

3D Graphics APIs: Open GL


There are many 3D APIs--each has different strengths and purposes. The most common of these
APIs are Silicon Graphics OpenGL, Apple QuickDraw 3D, Apple 3D RAVE API, and Microsoft
Direct3D.
3D APIs provide a standard platform through which 3D software can interact with 3D hardware;
developers no longer have to write individual drivers for each chipset on the market. Without 3D
APIs, 3D chip vendors pay game developers to port exclusively to their hardware, thereby
preventing users from mixing and matching products.
OpenGL excels at cross-platform work. OpenGL is optimized for technical work (few games use
OpenGL) and does not handle some high-level functions (e.g., file formats) as both QuickDraw 3D
and Direct3D do. OpenGL is a precise 3D-rendering technology that is important for CAD software
and 3D visualization (as opposed to 3D for games, in which speed is critical). OpenGL is a 2D and
3D graphics API. In 2003, OpenGL introduced a high- level shading language. QuickDraw 3D is
good for developers who program for both the PC and the Mac, while Direct3D and RAVE
(Rendering Acceleration Virtual Engine) are good for games, because they take full advantage of
PC hardware.

PC Architecture (TXW102) September 2008 22


Topic 6 - Graphics Architecture

3D Graphics APIs: Direct3D


In March 1996, Microsoft announced the Direct3D API as a component of the DirectX series of
Windows I/O APIs. Architecturally, Direct3D provides high- and low-level APIs for software
developers and a hardware abstraction layer that interfaces with drivers written by the 3D chip
vendors. Direct3D is tightly integrated into DirectDraw and allows combined, bitmapped 3D and
video effects in a manner similar to playing back video on the faces of a spinning cube.
Functionally, the high-level API includes 3D functions such as geometry engines and object
database routines that developers would otherwise have to write or license. The low-level API lets
developers of fast-switch games use their own optimized code and call more specific hardware
functions in order to maximize overall program speed. Also, Direct3D provides software emulation
of 3D functions, querying the graphics chipset to determine its capabilities and dynamically
allocating work between the processor and 3D chip in order to maximize overall performance.
Operating System 3D APIs
Windows 95 and 98 Direct 3D
Windows NT 4.0, 2000, and XP Direct 3D and OpenGL
Windows Vista DirectX 9 or 10

3D Effects for Games


In the past a major roadblock to developing 3-D effects for games was the need for API support.
Programmers had to wait for an effect to be embraced by the latest of, say, DirectX before they
could embed it into their games. Both ATI and NVIDIA have come up with solutions. Though the
vendors’ GPUs work with the latest API revisions (OpenGL 1.3 and DirectX 9), neither chip is
limited by the APIs.
ATI’s workaround comes in the form of Smartshader, which lets mathematical operations be
performed on texture addresses and color values of individual pixels. These operations result in on-
the-fly vertex and pixel shading respectively, and are the reason the high-end ATI Radeon
Controllers can make something like a leather jacket appear naturally bumpy when illuminated by,
for example, multiple gunfire flashes.
To accomplish just about the same thing, NVIDIA relies on nfiniteFX. Vertex and pixel shading
are handled in nfiniteFX, along with Shadow Buffers, which render shadows from both a light’s
point of view and that of the user, providing a realistic composite.

3D Graphics Definitions
•Alpha blending allows one object to show through another and provides the illusion of
transparency. Color keying is a different way of providing transparency.
•Anisotropic filtering is a technique to improve the look of texture images that are viewed at an
angle.
•Anti-aliasing helps clean up jagged edges at seams between mapped textures by using
transitional pixels of blended colors. It works by blending the color of each pixel with the
color of others around it.

PC Architecture (TXW102) September 2008 23


Topic 6 - Graphics Architecture

• Bilinear and trilinear filtering averages the nearest points in the texture to calculate a point in the
triangle rather than selecting the nearest point and results in smoother pictures and less jagged
lines. (Textures must be scaled to fit the triangle to which they are being mapped.)
• Bump mapping is giving a surface relief texture by changing its colors and shadows, depending
on the implied intensity and direction of the light hitting the surface.
• Double buffering – The Windows XP interface is single-buffered, meaning that one screen
update is painted on top of the last. But 3D graphics use a rendering method called page flipping,
in which graphics memory is allocated so it contains two full screens. The first is called the front
buffer. The back buffer contains the screen information for the next frame or 3D animation.
When content in the back buffer is ready to be displayed onscreen, the 3D card does a page flip,
swapping the buffers. Now what was the back buffer is onscreen as the front buffer. And what
was the front buffer becomes the back buffer, where it is cleared for the next frame of animation.
This technique also goes by the term double buffering.
• Fogging is used to make objects appear fuzzy and adds realism to scenes with fog or those scenes
in which an eerie feeling is implied. It can also be used for blurring distant objects.
• MIP mapping allows several versions of a texture to be used, depending on the nearness of the
object. Closer objects appear sharper to the eye and need more detailed texture. More distant
objects appear fuzzier and need less detailed texture.
• Perspective corrected texture mapping automatically maps the surface texture of each triangle on
an object to the perspective. As an object turns or moves forward or backward in a scene, the
perspective drawing of the object needs to change for visual correctness. Closer points on the
object must appear larger, and further points become increasingly smaller.
• A polygon is a shape defined by lines; a polygon must have at least three lines (sides). On a
polygon, the point where the lines connect is called a vertex.
• Reflection mapping is accurately drawing reflections of objects in glass, water, metal, and other
reflective surfaces.
• Shaders – Small programs that operate on individual pixel and vertex data to create impressive
3D effects. DirectX 8 and DirectX 9 give developers commands for building custom shader
programs.

PC Architecture (TXW102) September 2008 24


Topic 6 - Graphics Architecture

• Shader Model 3.0 – In 2005, the Microsoft DirectX 9 interface introduced an advanced
programming model called Shader Model 3.0. This model permits a simpler programming style
for game developers and allows better performance. Games written to this model need DirectX 9
version 9.0c or later. Games utilizing Shader Model 3.0 will still run on graphics controllers that
do not support this interface; these games normally have many ways of rendering animations and
use the one that works best for the controller. The NVIDIA 6800 line of graphics controllers
support this interface.
• Smooth shading (or Gouraud shading) allows color shading and brightness to change gradually
across each triangle. Across real objects, colors and brightness change gradually. Without smooth
shading, there would be a sharp transition between triangles, giving a blocky appearance (called
flat shading).
• Specular highlighting shows the reflection from a light source or the surface of an object.
• Transparency allows two surfaces to be blended for effects such as looking through window glass
or seeing through smoke.
• T-Buffer A T-Buffer (developed by 3dfx) holds multiple rendered frames before they are
displayed on your monitor. (An ordinary graphics chip stores only one image in its frame buffer,
swapping that screen with the one that is currently being displayed.) When enough screens are
stored up, the T-Buffer blends them together to eliminate the jagged edges that can sometimes
appear. The new image is then rendered on-screen.
• Z-buffering tracks the depth of the vertices of each triangle from the perspective of the viewer
and sorts the triangles to ensure that the objects that fall in front are the only ones drawn. 24-bit z-
buffering provides 24 distinct levels of depth for each pixel, and 32-bit z-buffering provides 32 of
these levels. The greater the bit depth, the more precise the indication of depth.

3D Benchmarks
3DMark05 is the first benchmark to require a DirectX 9.0-compliant hardware with support for
Pixel Shaders 2.0 or higher. 3DMark05 is the answer to the continuously growing challenge in
benchmarking. Visit www.futuremark.com for more information.
3DMark06 is the latest version in the popular 3DMark series, including advanced SM2.0 and
HDR/SM3.0 Shader graphics tests and now including single, multiple core, and multiple processor
CPU tests as part of the 3DMark score.
3DMarkMobile06 is a robust OpenGL ES 1.0 and 1.1 benchmark that tests 3D graphics
performance of future mobile 3D hardware. High-detail game content generates workloads that tax
OpenGL ES 3D hardware in a realistic way. Visit www.futuremark.com for more information.

PC Architecture (TXW102) September 2008 25


Topic 6 - Graphics Architecture

Graphics Subsystem Features:


DirectX 10

• DirectX 10 is the latest version of Microsoft’s DirectX application


programming interface (API)
• The APIs (for graphics and sound) included in DirectX allow
developers to easily access and use hardware features
• Up to 90% higher performance than DirectX 9
• Up to 30% more energy efficient than DirectX 9
• DirectX 10 is only available in Windows Vista
• Since June 2008, Microsoft required that Windows Vista Premium
solutions must support DirectX 10
• Lenovo offers desktop and notebook
products supporting DirectX 10

© 2008 Lenovo

DirectX 10
• DirectX 9 – Released in 2003, a Microsoft API for Windows used to produce realistic 3D
functions. DirectX 9 includes a High Level Shader Language. As of 2004 only high-end graphics
support DirectX 9; high-end game software also supports DirectX 9.
• DirectX 10 – DirectX 10 was released in 2007 with support only in Windows Vista. In previous
versions, programmers used separate languages to write pixel shaders and vector shaders. In
DirectX 10, there is a common language, reducing a programmer's development time. This
allows for up to 64,000 instructions in a shader program, as opposed to 512 instructions in
DirectX 9. New also are geometry shaders, which allow the graphics processor to create or
destroy geometry programmatically on the fly, something that previously was done on the
processor.

Direct3D10 is the graphics rendering part of the DirectX10 API and allows developers to take
better advantage of the DirectX 10 capabilities of AMD’s latest Radeon HD 2000 Series of
products

PC Architecture (TXW102) September 2008 26


Topic 6 - Graphics Architecture

DirectX 9 DirectX 10
Shader Model 2.0/3.0 4.0
Temporary Registers 32 4096
Constant Registers 256 65,536
Multiple Render Targets 4 8
Textures 16 128
Maximum Texture Size 4096 x 4096 8192 x 8192
Resource Validation Required on every use Required only on creation
Geometry Shaders No Yes
Unified Shader Instruction Set No Yes
Stream Output No Yes
Alpha to Coverage No Yes
Constant Buffers No Yes
State Objects No Yes
Texture Arrays No Yes
Integer & Bitwise Operations No Yes
Comparison Sampling No Yes
Render to Volume No Yes
Multiple Resource Views No Yes
Shared Exponent HDR Format No Yes

DirectX 9 and 10 Comparison

PC Architecture (TXW102) September 2008 27


Topic 6 - Graphics Architecture

Graphics Subsystem Features:


Pixel/Character Relationship

• Graphics addressability affects character size.


• As addressability increases, more information
can be displayed in greater detail (more pixels
per inch), but the information shrinks in size.
• Text size on a 17-inch display may be
adequate at 1024×768 but too small with
1600×1200 resolution.
• Good rule to follow:
- 640×480 is good for 14-inch displays
- 800×600 is good for 15-inch displays
- 1024×768 is good for 17-inch displays
- 1280×1024 is good for 19-inch displays ThinkVision L220x Wide
22-inch Flat Panel Monitor
- 1600×1200 is good for 21-inch displays
- 1920x1200 is good for 22-inch displays

© 2008 Lenovo

Pixel/Character Relationship
As resolution increases, more information can be displayed in greater detail (more pixels per inch),
but the information shrinks the image in size rather than making the same size image crisper. The
text size on a 17-inch display may be adequate at 1024×768, but it would be too small to be read
easily with 1600×1200 resolution. The Windows Control Panel allows some changing of type size,
but most text and application text can not be altered easily.
The following characteristics are true, on the same display:
• At 800×600, characters are 20 percent smaller than they are at 640×480.
• At 1024×768, characters are 37 percent smaller than they are at 640×480.
• 1024×768 shows 2.5 times more information than 640×480 (many more spreadsheet numbers).
• A 17-inch display provides almost 50 percent more viewing area than a 14-inch display.
• A 19-inch display provides almost 84 percent more viewing area than a 14-inch display.
The actual viewable area is smaller than the stated size of a display. CRTs do not produce a picture
edge-to-edge, so a 17-inch display typically yields a 15.5-inch viewing area.
• A typical 14-inch display is about 10.4-inch wide; at 800×600 that is 77 dots per inch.
• A typical 17-inch display is about 12.8-inch wide; at 1024×768 that is 77 dots per inch.

PC Architecture (TXW102) September 2008 28


Topic 6 - Graphics Architecture

Graphics Subsystem Features:


Multiple Monitors

• Two types of multiple monitor support on a PC


• Cloning: same image on all monitors

• Multi-monitoring (or dual independent display): unique images on all


monitors

Different Image Spanning or Wide Desktop

© 2008 Lenovo

Multiple Monitors
A PC can support multiple monitors. Typically, in a multiple monitor situation, most users will
have two monitors, but it is possible to have more than two monitors.
There are two different types of multiple monitor support: cloning or multi-monitoring. There are
no formal names for the various ways that multiple displays can be used. Each graphics vendor has
its own unique (and usually trademarked) names. The term multi-monitoring is becoming common
for using multiple monitors.
• Cloning – Cloning (or dual display clone) means that the same image appears on both displays.
Cloning is only really useful if two groups of people need to see the same image, such as in
training situations. The second output can be redirected to a large TV or video projector. Each
display device can be configured independently, allowing each to have a different refresh rate,
color depth, and resolution for optimum display on each device.
• Multi-monitoring – Multi-monitoring means the desktop addressability is increased so that it fills
all monitors. For two monitors, this is often called dual independent display. If each monitor is
set to 1024×768, then two desktops would be 2048×768. Multi-monitoring means that each
monitor can see different applications or windows on each monitor. Spanning or wide desktop is
another term for maximizing a window to fill both displays when the system is set up for dual
display, which is an expanded desktop that stretches across the two monitors. NVIDIA calls this
TwinView, a term that they trademarked. Intel calls this feature Extended Desktop.

PC Architecture (TXW102) September 2008 29


Topic 6 - Graphics Architecture

Graphics Subsystem Features:


Multiple Monitor Advantages

• Users get more pixels for the money with two smaller (17-inch) monitors
than with one larger (20-inch) monitor
- Two monitors are less costly (even with cost of additional graphics card).
- A dual-head graphics card or additional graphics card is required.
• Users are more productive and have fewer errors with multiple monitors.

Two 17-inch SXGA monitors side by side: One 20-inch UXGA monitor:

= 2560 x 1024 1,920,000


1600 x 1200
1280 x 1024 + 1280 x 1024 or 2,621,440
total pixels
total pixels

© 2008 Lenovo

Multiple Monitor Advantages


Industry costs for flat panel monitors have declined, which makes multi-monitoring an increasingly
cost-effective way to improve user productivity. The benefits are clear. For example:
• Two 15-inch monitors provide 20% more displayable content than one 19-inch monitor at a
comparable cost.
• Similarly, two 17-inch monitors boast 35% more displayable content than one 20-inch monitor at
a reduced cost.
To support two monitors, the PC must have two monitor connectors. Some graphics adapters have
dual-heads which means they have two physical connectors to support two monitors. Some desktop
systems support a low-cost adapter with a DVI-D or DVI-I connector; these connectors support
digital output. You could even add a second adapter in a PCI slot on desktop systems. (Systems
only have one AGP or PCI Express x16 slot.)
Multi-monitoring can help improve efficiency and quality of work for nearly any level of business
user. A recent productivity study (Anderson, et al. Productivity and Multi-Screen Displays,
University of Utah, 2004, sponsored by ATI Technologies, Inc. and NEC/Mitsubishi) found that:
• Users doing typical office tasks completed them faster and more accurately when using multiple
monitors compared to one monitor.
• Users completed tasks 7% faster.
• Users saw a 33% reduction in errors.

PC Architecture (TXW102) September 2008 30


Topic 6 - Graphics Architecture

• Nearly one-third of users felt they became more productive in each task faster when using more
than one monitor.
You can even have two monitors on a single stand. Almost all desktop flat panels can be detached
from their stands and attached to stands that use VESA-standard mounting brackets.

Multiple Monitor Configuration


For the ultimate flexibility on your desktop, an optional Radial Arm makes it easy to move the
monitor up, back, and around the work area for effortless placement and comfortable viewing.

Flat Panel Monitor on Radial Arm Radial Arm


part number 19K4464

The operating system must support multiple monitors. Windows 98, Windows Millennium Edition,
Windows 2000, and Windows XP Professional all support multiple monitors. Windows must be
configured to use two or more monitors. In Windows XP, right-click an open area of your Desktop
and choose Properties from the pop-up menu, then the Settings tab. If you have two monitors,
click the second and check Extend my Windows desktop onto this monitor.
Some dual-head or other graphics adapters have their own software utilities for setting up and
configuring multiple monitors. These utilities normally offer more features than the base function in
Windows XP.

PC Architecture (TXW102) September 2008 31


Topic 6 - Graphics Architecture

Quad independent display

Multi-monitoring, dual independent display, or quad independent display: Each monitor shows an
independent or unique image.

Clone mode: The same image across all monitors.

PC Architecture (TXW102) September 2008 32


Topic 6 - Graphics Architecture

Dual display: The Settings tab on Display Properties application


off the Control Panel enables dual display.

PC Architecture (TXW102) September 2008 33


Topic 6 - Graphics Architecture

ThinkPad and ThinkCentre Multi-Monitoring Solutions


Multi-monitoring technology has made it easier than ever to view more information and extend the
workspace using ThinkVision monitors attached to a ThinkPad or ThinkCentre system.
Presentation Director makes it simple to configure and use your ThinkPad notebook multi-monitor
configuration with an easy-to-use interface. Using the menus, you can easily switch between single
monitor, multi-monitor, and projector displays, as well as create and manage alternative display
schemes.

Two Lenovo ThinkVision monitors used with Multi-monitoring helps improve productivity by
a ThinkCentre tower, which would be providing users with more information at one
positioned beneath the desk. This multi- time. This ThinkPad notebook on a ThinkPad
monitoring configuration provides the user Advanced Dock is shown with a Lenovo
with more viewable area than one larger ThinkVision monitor.
monitor, and at a comparable cost.

Notebook with VGA in/out Switch


A notebook screen can function as a monitor. For example, on the Lenovo IdeaPad Y710, there is a
switch next to the VGA connector that selects the VGA connector as “in” or “out”. A selection of
“in” means the screen on the notebook functions as a monitor.

PC Architecture (TXW102) September 2008 34


Topic 6 - Graphics Architecture

Graphics Subsystem Features:


Windows Vista - Aero Requirements

• Aero
- Optional interface for Windows Vista
- Adds support for 3D graphics,
translucency, window animation,
and visual effects
• Vista Aero requires:
- DirectX 9 (DirectX 10 preferred) 3-D
Windows Aero (translucency)
graphics processing unit with a WDDM driver
- 128 MB graphics memory (minimum)
- Support for Pixel Shader 2.0
- Ability to display a color depth of
32 bits per pixel
- 1 GB or more of main memory

Use Flip 3D to navigate through open windows


using the scroll wheel on your mouse
© 2008 Lenovo

Windows Vista - Aero Requirements


Windows Vista supports an optional interface called Aero (sometimes called desktop composition
enabled). Aero adds support for 3D graphics, translucency, window animation, and visual effects.
The following requirements must be met in order to fully enable Windows Vista's Aero user
interface:
• DirectX 9 (DirectX 10 preferred) 3-D graphics processing unit with a WDDM driver
• 128 MB graphics memory (minimum)
Although graphics cards that share main system memory are acceptable, the best approach is a
minimum of 128 MB (the more, the better) of dedicated graphics memory.
• Support for Pixel Shader 2.0
• Ability to display a color depth of 32 bits per pixel
• Integrated graphics chipsets require 1 GB of dual-channel main memory
– Systems must have 512 MB of main memory available after graphics processing
• Uses approximately 22% of processor cycles
• 512 MB + 64 to 256 MB (dependent on monitor resolution) on the graphics processor for
discrete systems:

System Memory Monitor Resolution


64 MB 1280x1024 or less
128 MB 1280x1024 to 1920x1200
256 MB more than 1920x1200

PC Architecture (TXW102) September 2008 35


Topic 6 - Graphics Architecture

Graphics Subsystem Features:


Factors That Affect Graphics Performance

• Factors include (in order of


performance impact):
- Processor speed
- Video drivers
- Color depth
- DAC
- Memory type
- Vertical refresh rate
ThinkVision L194 Wide
- Resolution Flat Panel Monitor

• Monitor has no impact on video performance

© 2008 Lenovo

Factors That Affect Graphics Performance


• Processor speed has the greatest impact on video performance.
• Updated video drivers can increase video performance up to 50 percent.
• Resolution and color depth affect performance. Run at the lowest resolution and color depth that
are sufficient for your needs. Unless you are working with photographs, 256 colors will probably
look just as good as 24-bit 16.7 million color and put one third of the load on the graphics
controller. As a general rule, the more colors generated, the more data to transfer, so this will
have a slight negative impact on performance.
• The DAC (actually, there are three, one for each color) must be able to convert digits to an
analog color signal fast enough to modulate the beam at maximum rate. This rate is the horizontal
scan frequency multiplied by the number of horizontal pixels. The higher the speed (measured in
MHz), the faster the graphics performance.
• Memory type plays a role in performance.
• Vertical refresh rate generally has little impact on video performance.
• Resolution generally has little impact on video performance.

PC Architecture (TXW102) September 2008 36


Topic 6 - Graphics Architecture

Integrated Graphics Controller:


Intel Graphics Media Accelerator X3100 (GMA X3100)

• Integrated graphics controller in the memory controller of notebook chipset


- GM965 (2007)
• Provides good performance for mainstream business applications without the
cost of a discrete PCI Express x16 graphics controller
• Uses up to 384 MB of main memory (DVMT 4.0)
• GMA X3100 core clocked at 266, 320, 400 or 500 MHz
• Supports Intel Clear Video Technology

Processor
GMA X3100

Graphics Main
Memory memory
Controller Hub

ICH

© 2008 Lenovo

Intel Graphics Media Accelerator X3100 (GMA X3100)


The Intel Graphics Media Accelerator X3100 (GMA X3100) is the integrated graphics controller in
the Intel GM965 mobile chipset. The graphics controller is located within the same physical chip as
the memory controller, comprising what is called the Graphics Memory Controller Hub (GMCH).
The core is clocked at 266, 320, 400, or 500 MHz.
The Intel GMA X3100 family supports Intel Dynamic Video Memory Technology 4.0 (DVMT
4.0). The system memory is shared by both graphics and system usages. Intel Dynamic Video
Memory Technology 4.0 ensures that the graphics applications are allocated memory depending on
the requests and the allocated memory is released to the system once the application is closed. This
technology allows for 64 MB to 384 MB of memory allocated to the graphics engine (see table
below). This provides more memory for intensive applications for maximum system performance.

System Memory Size Windows 2000/XP Windows Vista (32-bit/64-bit)

512 MB 128 MB 64 MB
1 GB 256 MB 251 MB
1.5 GB or more 384 MB 358 MB

The GMA X3100 supports Intel Clear Video Technology (discussed on following pages).

PC Architecture (TXW102) September 2008 37


Topic 6 - Graphics Architecture

Integrated Graphics Controller:


Intel Graphics Media Accelerator 4500MHD (GMA 4500MHD)

• Integrated graphics controller in the memory controller of notebook chipset


- 4500M for GL40 at 400 MHz (2008)
- 4500MHD for GS45 at 266 MHz and GM45 at 533 MHz (2008)
• Provides good performance for mainstream business applications without the
cost of a discrete PCI Express x16 graphics controller
• Uses main memory (DVMT 4.0)
• GMA 4500MHD core clocked at 266, 400, or 533 MHz
• Supports Intel Clear Video Technology

Processor GMA 4500MHD

Graphics Main
Memory memory
Controller Hub

ICH

© 2008 Lenovo

Intel Graphics Media Accelerator 4500MHD (GMA 4500MHD)


The Intel Graphics Media Accelerator 4500M and 4500MHD are the integrated graphics
controllers in the Intel mobile chipsets (GL40, GS45, and GM45). The graphics controller is
located within the same physical chip as the memory controller, comprising what is called the
Graphics Memory Controller Hub (GMCH). The core is clocked at 266, 400, or 533 MHz.
The Intel GMA 4500M family supports Intel Dynamic Video Memory Technology 4.0 (DVMT
4.0). The system memory is shared by both graphics and system usages. Intel Dynamic Video
Memory Technology 4.0 ensures that the graphics applications are allocated memory depending on
the requests and the allocated memory is released to the system once the application is closed.
All supports Intel Clear Video Technology (discussed on following pages).

PC Architecture (TXW102) September 2008 38


Topic 6 - Graphics Architecture

Integrated Graphics Controller:


Intel Graphics Media Accelerator X3500 (GMA X3500)

• Integrated graphics controller in the memory controller of desktop chipset


- G35 chipset (2007)
• Provides good performance for mainstream business applications without the
cost of a discrete PCI Express x16 graphics controller
• Uses up to 384 MB of main memory (DVMT 4.0)
• Supports Intel Clear Video Technology

Processor
GMA X3500

Graphics Main
Memory memory
Controller Hub

ICH

© 2008 Lenovo

Intel Graphics Media Accelerator X3500 (GMA X3500)


The Intel Graphics Media Accelerator X3500 (GMA X3500) is the integrated graphics controller in
the Intel G35 desktop chipset. The graphics controller is located within the same physical chip as
the memory controller, comprising what is called the Graphics Memory Controller Hub (GMCH).
The Intel GMA X3500 family supports Intel Dynamic Video Memory Technology 4.0 (DVMT
4.0). The system memory is shared by both graphics and system usages. Intel Dynamic Video
Memory Technology 4.0 ensures that the graphics applications are allocated memory depending on
the requests and the allocated memory is released to the system once the application is closed. This
technology allows for 64 MB to 384 MB of memory allocated to the graphics engine (see table
below). This provides more memory for intensive applications for maximum system performance.

System Memory Size Windows 2000/XP Windows Vista (32-bit/64-bit)

512 MB 128 MB 64 MB
1 GB 256 MB 251 MB
1.5 GB or more 384 MB 358 MB

The GMA X3500 supports Intel Clear Video Technology (discussed on following pages).

PC Architecture (TXW102) September 2008 39


Topic 6 - Graphics Architecture

Intel Graphics Core: Intel GMA 950 Intel GMA 3000 Intel GMA 3100 Intel GMA X3000 Intel GMA X3100 Intel GMA X3500

945G, 945GM 946GZ, Q963, G31, G33, G965 GM965, GL960 G35
Intel Chipset:
Q965 Q33, Q35

Memory Up to 256 MB Up to 256 MB Up to 256 MB Up to 384 MB Up to 384 MB Up to 384 MB

DirectX Support DirectX 9.0c DirectX 9.0c DirectX 9.0c DirectX 9.0c DirectX 101 DirectX 101

OpenGL Support 1.4 + Extensions 1.4 + Extensions 1.4 + Extensions OpenGL 1.5 OpenGL 1.5 OpenGL 2.0

Hardware T&L No No No Yes2 Yes2 Yes2

Shader Model
2.0 2.0 2.0 3.02 4.01 4.01
Support

Intel Clear Video Yes


No No Yes Yes Yes
Technology (G33 chipset only)

HD hardware Acceleration

MPEG-2 Decode Yes


Supported3 Yes Yes (G31, Q33, Q35; Yes Yes Yes
(HW MC) (HW MC) HW MC, G33; VLD (VLD + iDCT + MC) (VLD + iDCT + MC) (VLD + iDCT + MC)
+ iDCT + MC)

VC-1 Decode
Supported4 No No No Yes Yes Yes
(can support (can support (can support (MC + In Loop (MC + In Loop (MC + In Loop
through software) through software) through software) Filter –WMV9 only) Filter –WMV9 only) Filter)

HD Video Playback

HD-DVD media Yes5


No No Yes5 Yes5 Yes5
(G33 chipset only)

Blu-ray disc Yes5


No No Yes5 Yes5 Yes5
(G33 chipset only)

MPEG-2 1080p, 1080i, 1080p, 1080i, 1080p, 1080i, 1080p, 1080i, 1080p, 1080i, 1080p, 1080i,
and 720p and 720p and 720p and 720p and 720p and 720p

VC-1 1080p, 1080i, 1080p, 1080i, 1080p, 1080i, 1080p, 1080i, 1080p, 1080i, 1080p, 1080i,
and 720p and 720p and 720p and 720p and 720p and 720p

AVC/H.2646 1080i and 720p 1080i and 720p 1080i and 720p 1080i and 720p 1080i and 720p 1080i and 720p

1 DirectX 10 and Shader Model 4.0 software driver support expected in Q1 2008
2 Hardware T&L and Shader Model 3.0 software driver support expected in August 2007
3 HW accelerated MPEG-2 VLD not currently supported on Windows Vista due to OS issue (to be resolved in Vista SP1)
4 HW accelerated WMV9b/VC-1 not supported on Windows Vista until August 2007
5 HD-DVD or Blu-ray disc media playback requires the use of a third-party decoder card and appropriate software drivers
6 Ability to play back AVC/H2.64 content is dependent on system configuration, content bitrate, and resolution

PC Architecture (TXW102) September 2008 40


Topic 6 - Graphics Architecture

Integrated Graphics Controller:


Intel Graphics Media Accelerator 4500 Family

• Integrated graphics controller in the memory controller of desktop chipset


- GMA 4500 in Q43 and Q45 chipset (2008)
- GMA X4500 in G41 and G43 chipset (2008)
- GMA X4500HD in G45 chipset (2008)
• Provides good performance for mainstream business applications without the
cost of a discrete PCI Express x16 graphics controller
• Uses main memory (DVMT 4.0)
• Supports Intel Clear Video Technology
Processor GMA 4500 Family

Graphics Main
Memory memory
Controller Hub

ICH

© 2008 Lenovo

Intel Graphics Media Accelerator 4500 Family


The Intel Graphics Media Accelerator 4500 (GMA 4500) is the integrated graphics controller in the
Intel 4 Series desktop chipset that announced in 2008. The graphics controller is located within the
same physical chip as the memory controller, comprising what is called the Graphics Memory
Controller Hub (GMCH).
The Intel GMA X3500 family supports Intel Dynamic Video Memory Technology 4.0 (DVMT
4.0). The system memory is shared by both graphics and system usages. Intel Dynamic Video
Memory Technology 4.0 ensures that the graphics applications are allocated memory depending on
the requests and the allocated memory is released to the system once the application is closed. This
provides more memory for intensive applications for maximum system performance.
The GMA 4500 Family supports Intel Clear Video Technology (discussed on following pages).

4500 X4500 X4500HD

Chipset Q43, Q45 G41, G43 G45

Monitor ports DVI, DisplayPort DVI, DisplayPort, HDMI DVI, DisplayPort, HDMI
Full hardware decode acceleration of
Hardware decode No No
MPEG2, VC1, and AVC

PC Architecture (TXW102) September 2008 41


Topic 6 - Graphics Architecture

The Intel Graphics Media Accelerator 4500 (GMA 4500) family supports 3D enhancements for
everyday games and improved realism with support for Microsoft DirectX 10, Shader Model 4.0,
OpenGL 2.0, and HDCP key integration.

Q45/Q43 G41 G43 G45

Graphics Media
GMA 4500 GMA X4500 GMA X4500 GMA X4500HD
Accelerator
Full Hardware MPEG2
No Yes Yes Yes
Video Decode
Full Hardware
Acceleration VC1 Yes No No Yes
Decode

Full Hardware
Acceleration AVC Yes No No Yes
Decode
HD Security
Yes Yes Yes Yes
PAVP/HDCP
DisplayPort, DVI, DisplayPort, DVI, DisplayPort, HDMI, DVI, DisplayPort, HDMI, DVI,
Display Interfaces SDVO Dual Independent SDVO Dual Independent SDVO Dual Independent SDVO Dual Independent
Display Display Display Display

Hi-Def resolutions 1080p/I, 720p 1080p/I, 720p 1080p/I, 720p 1080p/I, 720p

DirectX Support DirectX 10 DirectX 10 DirectX 10 DirectX 10

Windows Vista Premium Premium Premium Premium

OpenGL Support 2.0 2.0 2.0 2.0

Intel Clear Video


No Yes Yes Yes
Technology
HD Post Processing
No Yes Yes Yes
features

PC Architecture (TXW102) September 2008 42


Topic 6 - Graphics Architecture

Intel Clear Video Technology

• Combination of video processing hardware


and software technologies for a wide range
of digital displays
• Benefits
- Enhanced high-definition
video playback
- Sharper images
- Vibrant colors
- Advanced display capability
• Utilized in latest desktop
chipsets with GMA
• Utilized in latest notebook
chipsets with GMA Intel Clear Video Technology uses advanced
de-interlacing technology and ProcAmp color
controls to deliver sharper, more vibrant video
images.

© 2008 Lenovo

Intel Clear Video Technology


Intel Clear Video Technology is available on many Intel integrated graphics chipsets and is
designed specifically to meet the changing needs of the digital home user.
Today's digital home consumers are demanding high-quality video playback and sharp image
quality from their PC. Intel Clear Video Technology is designed to address these needs with
advanced processing capabilities to enable a richer entertainment experience. In addition to stutter-
free playback and vibrant color controls, this technology delivers crystal clear images without the
imperfections and artifacts usually associated with PC-based video content.
Enhanced High Definition Playback – High-definition video playback without the need for add-in
video cards or decoders. Enables smoother video playback and multi-stream playback for picture-
in-picture.
Sharper Image Quality – Maintain sharp image quality even at high resolutions, and reduce artifacts
through both analog and digital video signals.
Vibrant colors – Allows user adjustment of hue, saturation, brightness, and contrast. This enables
more accurate color of video clips under any environmental lighting condition.

PC Architecture (TXW102) September 2008 43


Topic 6 - Graphics Architecture

Advanced Display Capability – Allows your PC to connect to a wide range of digital displays by
supporting the latest digital display interfaces, including the High-Definition Multimedia Interface
(HDMI), which carries uncompressed HD video and multi-channel audio in one cable.
For PCs using digital TV tuners, Intel Clear Video Technology adds support for Media Expansion
or ADD2 cards which enable digital display connections such as DVI and TV-out in a single-card
solution.
Visit www.intel.com/go/clearvideo for more information.

Feature Benefit
MPEG-2 decode iDCT + motion compensation. Up to 2 stream support
(1 HD and 1 SD)

De-interlacing Advanced pixel adaptive (SD/HD-1080i)


Color control ProcAmp: brightness, hue, saturation, contrast

Digital Display Support Digital Video Interface (DVI),


(through SDVO) High-Definition Multimedia Interface (HDMI), Unified
Display Interface (UDI)

Display support RGB (QXGA), HDMI, UDI, DVI, HDTV (1080i/p,


720p), Composite, Component, S-Video (via Intel
Serial Digital Video Out), TV-out, CRT

Video scaling 4x4 scaling


Dynamic display modes Flat-panel, wide-screen, digital TV
Aspect ratio 16:9, 4:3, letterbox
Maximum resolution support 2048x1536 at 75 Hz, RGB (QXGA)
Operating systems support Microsoft Windows Vista, Microsoft Windows XP,
Windows XP 64-bit, Media Center Edition, Windows
2000, Linux-compatible (XFree86
source available)

Intel Clear Video Technology uses advanced hardware and software techniques
to deliver smooth high-definition playback, sharp images with fine detail, and
precise color control, enabling a premium visual experience.

PC Architecture (TXW102) September 2008 44


Topic 6 - Graphics Architecture

Integrated Graphics Controller:


Advanced Digital Display 2 (ADD2)

• The Advanced Digital Display 2 (ADD2) adapter is used primarily to


support a digital monitor or a second monitor with a DVI connector.
• It is a low cost solution for adding a second graphics connector to desktop
systems.
• The graphics connector is either a DVI-D connector (digital) or a DVI-I
connector (analog or digital).
• It is low cost because it uses Graphics Media Accelerator as the graphics
controller.
• It is used in the PCI Express x16 slot.
• Lenovo markets:
- DVI-I Connection Adapter
- DVI-D Monitor Connection Adapter (HDCP)

Lenovo ADD2 DVI-D Monitor


Connection Adapter (HDCP)
© 2008 Lenovo

Advanced Digital Display 2 (ADD2)


The Intel Graphics Media Accelerator (GMA) in various Intel chipsets support two multiplexed
Serial Digital Video Out (SDVO) ports that each drive pixel clocks up to 200 MHz.
Serial Digital Video Out (SDVO) is a digital display channel that serially transmits digital display
data to an external SDVO device. The SDVO device accepts this serialized format and then
translates the data into the appropriate display format (i.e., Transition Minimized Differential
Signaling [TMDS], Low Voltage Differential Signaling [LVDS], TV-Out). This interface is not
electrically compatible with the previous digital display channel named Digital Video Out (DVO).
For the latest Intel desktop chipsets, it is multiplexed on a portion of the PCI Express x16 graphics
interface.
The GMA utilizes these two digital display channels via an optional Advanced Digital Display 2
(ADD2) adapter. An ADD2 adapter provides digital display options by plugging into a PCI
Express x16 connector by using the multiplexed SDVO interface. The ADD2 adapter physically
requires a PCI Express x16 slot, but the slot may not be wired to support x16 adapters; for example,
select Lenovo desktops have a PCIe x16 slot that supports the ADD2 adapter, but only supports
PCIe x1 adapters. There is a removable cap on the x16 slot so PCIe x16 adapters will not plug into
the slot (although the cap can be removed); the cap can remain on the slot while an ADD2 adapter
is used in the slot.

PC Architecture (TXW102) September 2008 45


Topic 6 - Graphics Architecture

Riser Card of ThinkCentre M52 Small


Top slot: PCI 2.3 slot
Bottom slot: PCI Express x16 slot for ADD2 Adapter
[note the cap to prevent x16 adapter insertion]

It is possible to combine the two multiplexed SDVO ports to drive large (high addressability) digital
displays. When combined with a DVI-compliant external device and connector, the GMCH has a
high-speed interface to a digital monitor (e.g., flat panel or digital CRT).

PC Architecture (TXW102) September 2008 46


Topic 6 - Graphics Architecture

SDVO ports in either single/single-combined or dual operation modes are supported with these
features:
• Each SDVO port runs at a pixel rate of 200 MP/s. The two ports can be combined to work together
as a single port with an effective pixel rate of 400 MP/s. The 400 MP/s pixel rate allows for
support of QXGA resolutions (2048×1536 pixels) at refresh rates up to 85 Hz.
• Intel SDVO ports can interface to codecs that enable support for LVDS panels, DVI-I and DVI-D
displays, standard- and high-definition televisions and CRTs.
• Analog display support.
• HDTV 720p and 1080i display resolution support.
• Each SVDO port can support a single channel device.
• If both SDVO ports are active they will have identical display timings and data.
• 400 MHz integrated 24-bit RAMDAC (200 MHz per channel).
• Hardware color cursor support.
• DDC2B-compliant interface.
• Dual Independent Display support with digital display.
• Multiplexed Digital Display Channels (supported with ADD2 Card).
• Two channels multiplexed with PCI Express x16 slot.
• 200 MHz dot clock on each 12-bit interface.
• Can combine two channels to form one larger interface.
• Supports flat panels up to 2048x1536 at 60 Hz or digital CRT/HDTV at 1920×1080 at 85 Hz.
• Supports hot plug and display.
• Supports TMDS transmitters or TV-out encoders.
• ADD2 card utilizes PCI Express graphics x16 connector.

Analog
Display
VGA

ADD2 Intel
Display
Card 910GL
915GV
SDVO
Display 915G
or 945G
PCI Express GMCH
Graphics x16 Graphics
Display
Card only in 915G
And 945G

Block Diagram of SDVO ADD2

PC Architecture (TXW102) September 2008 47


Topic 6 - Graphics Architecture

ADD2 Adapters
Lenovo markets the following Advanced Digital Display 2 (ADD2) adapters:
• DVI-I Connection Adapter which was introduced in 2005 and has a DVI-I connector to support
either an analog or digital monitor; an analog monitor requires an optional dongle to convert the
connector to a DB-15 connector

ADD2 Adapter with Dongle (side)


PCI Express DVI-I Connection Adapter
(Low Profile adapter; ships with a full height bracket and Low Profile bracket)

ADD2 Adapter with Dongle (back; DVI-I connector)


PCI Express DVI-I Connection Adapter

ADD2 Adapter (side)


PCI Express DVI-I Connection Adapter

PC Architecture (TXW102) September 2008 48


Topic 6 - Graphics Architecture

• ADD2 DVI-D Monitor Connection Adapter (HDCP) which was introduced in 2007 and has a
DVI-D connector to support a digital monitor; it supports High-bandwidth Digital Content
Protection (HDCP) which is digital rights management to control the digital audio and video
content as it travels across the DVI interface. It ships with both a Low Profile and Full Height
bracket.

ADD2 Adapter with Dongle (side)


PCI Express DVI-I Connection Adapter
(Low Profile adapter; ships with a full height bracket and Low Profile bracket)

ADD2 DVI-D Monitor Connection Adapter (HDCP)


Part number 43R1985

PC Architecture (TXW102) September 2008 49


Topic 6 - Graphics Architecture

Media Expansion Card or ADD2+


Various Intel chipsets support an adapter that is very similar to the Advanced Digital Display 2
(ADD2) adapter. This adapter is called a ADD2+ (the “+” refers to additional media support such as
a dual TV tuner, TV in, TV out, audio in). Intel calls these ADD2+ adapters Media Expansion Cards.
Media Expansion Cards are available from many third-party add-in card vendors. Media Expansion
Cards allow the functionality of two or more discrete PCI Express cards to be combined into one
single card. For example, if an OEM wants to use a PCI Express-based TV tuner add-in card and
another PCI Express-based video card with TV-Out capability, then the functionality of these two
discrete add-in cards could be provided by a single Media Expansion Card. This flexibility allows
OEMs to lower overall system costs.
The Media Expansion Card uses the Intel Serial Digital Video Output (SDVO) interface that is
multiplexed on the PCI Express x16 port for video-out capabilities and one PCI Express x1 lane for
video-in capabilities. The SDVO port on the GMCH only uses eight of the 16 PCI Express lanes
available on the PCI Express x16 connector. The Media Expansion Card uses one of the remaining
eight PCI Express lanes to provide video-in capabilities along with the video-out capabilities via a
single add-in card.

DVI

TV out

Tuner SDVO DVI


PCI Express Transmitter/
Video/Audio TV/VGA Encoder
Decoder
Tuner
in
RF

TV in PCI Express x1 SDVO


Interface Interface
Audio in

PCI Express SDVO


Lane 0 Lanes 8-15

PCI Express x16 Connector

Media Expansion Card (ADD2+) Architecture

PC Architecture (TXW102) September 2008 50


Topic 6 - Graphics Architecture

Integrated Graphics Controller:


SurroundView

• AMD/ATI term for a desktop supporting two


monitors using integrated graphics controller
• Requires a unique DVI-D Cable that plugs
into a DVI header on systemboard
• Desktop can support four monitors with only
one dual-head ATI PCI Express x16 adapter
DVI-D Cable
• Supported on select Lenovo ThinkCentre (low profile or full height bracket)
A61 desktops
DVI-D Cable
bracket blocks the
PCI Express x1
slot

DVI-D Cable
connects to
DVI header on
Lenovo ThinkCentre A61 supporting systemboard
four monitors
© 2008 Lenovo

SurroundView
SurroundView is the AMD/ATI technology that provides low-cost multiple monitor support.
SurroundView allows the integrated graphics controller (such as the ATI Radeon X1250 or Xpress
200) to support two monitors with the addition of a single cable; a separate graphics adapter is not
required.
A supported desktop or tower can support two monitors without the use of a PCI Express x16
adapter. The first monitor uses the standard VGA connector. The second monitor is connected to a
DVI-D connector through only a cable that utilizes the integrated graphics controller. The
integrated graphics controller will drive two monitors. The second monitor is supported by
plugging in a DVI-D cable into a DVI header on the systemboard. The DVI-D cable has a DVI-D
connector on either a low profile or a full height bracket. The installed cable will prevent the use of
the PCI Express x1 slot.
SurroundView also allows a desktop or tower to support four monitors with only one ATI PCI
Express x16 graphics adapter via the following:
• An analog monitor with the standard VGA connector
• A digital monitor with an optional DVI-D Cable that plugs into a DVI header on the systemboard
and has a DVI-D connector on a bracket
• Two additional monitors via a dual-head adapter in the PCI Express x16 slot. Only ATI adapters
are supported to get four monitor support; if a non-ATI adapter is installed, the integrated
graphics is disabled.
SurroundView also will allow the dual-head adapter in the PCI Express x16 slot to not disable the
integrated graphics controller but only if an ATI adapter is used (regardless of whether a DVI-D
Cable is installed).

PC Architecture (TXW102) September 2008 51


Topic 6 - Graphics Architecture

DVI-D

VGA

Back of Lenovo ThinkCentre A61

DVI header

DDR Memory Transition


Source: Micron Marketing

Lenovo ThinkCentre A61 Systemboard with


AMD 690G chipset

Lenovo ThinkCentre A61 Systemboard with Lenovo ThinkCentre A61 Systemboard with
DVI-D Cable installed in DVI header to dual head graphics adapter installed in PCI
support two monitors Express x16 slot for four monitor support

PC Architecture (TXW102) September 2008 52


Topic 6 - Graphics Architecture

PCI Express for High-End Graphics

• PCI Express has replaced AGP


and PCI for high-end
graphics adapters
• Desktops: a PCI Express x16 slot
is included on some desktop
systemboards
• Notebooks: a graphics controller
using the PCI Express x16 bus is
used in higher end notebooks
PCI Express x16 Adapter (Low Profile)
ATI Radeon HD 2400 XT
(part number 43R1962)

PCI 2.3 AGP 8x PCI Express x16


32-bit parallel bus 32-bit parallel link 16-lane serial link
132 MB/s 2.1 GB/s 5 GB/s
199x 2002 2004

© 2008 Lenovo

PCI Express for High-End Graphics


In mid-2004, PCI Express became available on desktop systems. PCI Express has replaced AGP as
the interface for high-end graphics adapters because newer chipsets only support PCI Express.
Mid- to low-end graphics will be handled by the integrated graphics controllers in the Graphics
Memory Controller Hub of chipsets.
PCI Express is a serial, point-to-point, full duplex link. PCI Express runs at 2.5 Gb/s or 300 MB/s
per lane in each direction (encoded). PCI Express x16 means there are 16 independent serial
connections transmitting data simultaneously so it provides 300 MB/s x 16 = 4.8 GB/s bandwidth
in each direction (encoded) or 9.6 GB/s duplex.

Processor

x16
Graphics
MCH Memory

PCI Express

PCI slots
Gb Ethernet
Serial ATA
PCI Express x1 slot ICH
USB 2.0
PCI Express x4 slot
Super I/O

PC Architecture (TXW102) September 2008 53


Topic 6 - Graphics Architecture

Unlike the AGP slot, the PCI Express x16 slot can be used for other PCI Express adapters if a PCI
Express graphics adapter is not required.
PCI Express x16 graphics adapters are one of the following sizes:
• Low Profile (LP) – Allows the graphics adapter to fit both into Low Profile and full-height
systems. A full-height system requires a full-height bracket to be attached. Allows for customers
to choose the smallest systems and still achieve dual monitor output.
• Full-Height – Allows the graphics adapter to fit in full-height systems. Sometimes referenced as
ATX (based on ATX system motherboard design).

PCI Express x16 Adapter (Low Profile with full-height bracket)


ATI Radeon HD 2400 Pro with 128 MB
(part number 43R1961)

PCI Express x16 Adapter (Low Profile)


ATI Radeon HD 2400 XT with 256 MB
(part number 43R1962)

PC Architecture (TXW102) September 2008 54


Topic 6 - Graphics Architecture

PCI Express x16 Adapter (full-height)


ATI Radeon HD 2600 XT with 512 MB
(part number 43R1963)

PC Architecture (TXW102) September 2008 55


Topic 6 - Graphics Architecture

NVIDIA Quadro 256 MB NVS 290 NVIDIA Quadro 256 MB FX 370


(part number 43R1765) (part number 43R1766)

NVIDIA Quadro 512 MB FX 1700 NVIDIA Quadro 768 MB FX 4600


(part number 43R1767) (part number 43R1769)

PC Architecture (TXW102) September 2008 56


Topic 6 - Graphics Architecture

Power From Power Supply


Starting in 2003, some high-end graphics adapters, especially for gamers, started to use a power
connection from the PC’s power supply to provide additional power to the graphics adapter. For
example, the same power connector that would plug into a optical drive would be plugged into a
connector on the graphics adapter. Normally a power extension cable is required, so a power drop
for the optical device is still available. The graphics adapter would still be plugged into an PCI
Express x16 slot.
For example, the Lenovo ThinkStation S10 and D10 has two PCI Express 2.0 x16 slots, providing
up to 75 watts to a graphics adapter. An additional 75 watts (for a maximum of 150 watts) are
provided to each adapter via a power extension cable.

Graphics adapter power connector


for power supply power drop

PC Architecture (TXW102) September 2008 57


Topic 6 - Graphics Architecture

PCI Express x16 Retention Clip


PCI Express x16 adapter slots have a retention clip. This clip helps to hold the adapter in the slot
during shipment of the computer and while the computer is in operation. Push on the handle portion
to remove the adapter from the slot.

Retention Clip for PCI Express x16 Adapter Slot

PCI Express x16 slot

PC Architecture (TXW102) September 2008 58


Topic 6 - Graphics Architecture

HyperMemory (ATI) and TurboCache (NVIDIA)


PCI Express x16 graphics cards have their own dedicated graphics memory on the adapter (such as
64 MB, 128 MB, or 256 MB). However, graphics cards can use the dedicated graphics on the
adapter until it is maximized, then use the main system memory with minimal performance impact
to a system. ATI calls this feature HyperMemory for its products. NVIDIA calls this feature
TurboCache for its products. Both HyperMemory and TurboCache use dedicated graphics
controller memory first, then borrow main memory as needed. This is very common with notebook-
based graphics adapters. Lenovo notebooks with ATI or NVIDIA graphics support this feature.

Processor

Memory
Controller System Memory
Hub

Aux. Memory
Interface
PCI Express x16
Auxiliary Memory
Channel

Graphics Graphics Adapter


Adapter Memory

Diagram of graphics adapter using main system memory


after its own memory is maxed out
(ATI HyperMemory)

Following is an example of TurboCache memory allocation for the:


• NVIDIA Quadro NVS 140M
• NVIDIA Quadro FX 570M
Figures show the total available graphics memory which consists of the dedicated memory on the
graphics controller plus the memory borrowed in the main memory.

Graphics
512 MB 768 MB 1 GB 2 GB
Controller
Main Memory Main Memory Main Memory Main Memory
Memory
64 MB 128 MB 191 MB 319 MB 831 MB
128 MB 192 MB 255 MB 383 MB 895 MB
256 MB 320 MB 383 MB 511 MB 1023 MB

Total Available Graphics (TAG) Memory

PC Architecture (TXW102) September 2008 59


Topic 6 - Graphics Architecture

PCI Express:
PCI Express Graphics 150W-ATX, 225W, 300W Specifications

• Extension to PCI Express for advanced workstation graphics


• Provides more electrical power (up to 150 watts, 225 watts, or 300 watts)
and more space (blocks adjacent slot[s]) to adapter card
PCI Express x16 PCI Express x16 PCI Express x16 PCI Express x16 PCI Express x16
Low profile card Standard card 150W-ATX card 225W card 300W card

Uses 25 watts Uses 75 watts Uses 75 to 150 watts Uses 151 to 225 watts Uses 226 to 300 watts

Up to two Up to two Up to three


One PCIe slot One PCIe slot
PCIe slots PCIe slots PCIe slots

Component side
Component side

reserved area
reserved area

PCI Express x16 PCI Express x16


Graphics 150W-ATX Graphics 300W
or 225W graphics graphics card
card

PCI Express PCI Express


connector(s) connector(s)
Systemboard Systemboard
© 2008 Lenovo

PCI Express x16 Graphics 150W-ATX, 225W, and 300W Specification


In 2004, the PCI Special Interest Group (PCI-SIG) announced the PCI Express x16 Graphics
150W-ATX Specification 1.0. Then in 2008, the PCI-SIG announced the PCI Express x16
225W/300W High Power Card Electromechanical Specification. These specifications are aimed at
high-end graphics applications that require increased power. They define a standard power
connector to meet the growing needs of power-hungry graphics adapter cards. The specifications
are only written for the ATX chassis implementations (not BTX).
These specifications address graphics power and thermals greater than those supported by the PCI
Express Card Electromechanical Specification 1.1 (CEM). The purpose is to provide additional
capabilities for PCI Express graphics within the existing framework of an evolutionary strategy that
is based on existing systemboard form factors. The specifications are primarily designed to deliver
additional electrical power to a PCI Express graphics add-in card and provide increased card
volume for the management of thermals.

PC Architecture (TXW102) September 2008 60


Topic 6 - Graphics Architecture

These specifications are intended to support both workstation and consumer graphics. The
specifications do not support the optional hot-plug functionality of PCI Express CEM 1.1.
Together, PCI Express CEM 1.1 and these specifications support three distinct maximum power
levels for graphics:
• 25 watt (low profile card)
• 75 watt (standard size)
• 150 watt (76 to 150 watts)
• 225 watt (151 to 225 watts)
• 300 watt (226 to 300 watts)
These graphics cards may use the space of the adjacent expansion slot(s), thereby providing more
volume for thermal solutions and components on the primary side of the card than the standard PCI
Express add-in card which is constrained to the width of a single expansion slot. For example, a
system that supports a PCI Express x16 Graphics 150W-ATX add-in card is required to ensure that
sufficient power and thermal support exists. In an ATX form factor system, the adjacent expansion
slot can be left vacant allowing for 1.37 inches of clearance for the add-in card as illustrated in the
figure. The area on the add-in card that can utilize this height, as well as the restricted height of the
secondary side, is not defined in this specification; instead, it leverages the general PCI Express
add-in card requirements for these dimensions.

34.8 Max
[1.370]

Component side
reserved area

PCI Express x16 Graphics 150W-


ATX graphics card
(I/O bracket and end bracket
removed for clarity)

PCI Express connector

Systemboard
20.32 20.32
[0.800] [0.800]

All dimensions: mm [inches]

PC Architecture (TXW102) September 2008 61


Topic 6 - Graphics Architecture

55.12 mm Max
[2.170 inches

Component side
reserved area

PCI Express x16 Graphics 300W


graphics card
(I/O bracket and end bracket
removed for clarity)

PCI Express connector

Systemboard
20.32 20.32
[0.800] [0.800]

All dimensions: mm [inches]

PC Architecture (TXW102) September 2008 62


Topic 6 - Graphics Architecture

Monitor Features:
CRT Monitor Characteristics

• Vertical refresh rate speed--the Hz = cycles per second


electron beam scans from top 75 Hz = 75 times per second
to bottom of screen
• Horizontal refresh rate is directly = 1/75 of
related to vertical refresh rate (kHz) a second
• Higher refresh rate reduces flicker
- Dot pitch
- .22 mm to .39 mm
- Distance between adjacent Pixel size for VGA
Pixel size for SVGA
(1024×768)
holes in shadow mask (640×480)

• Pixel size varies with addressability

0.28 mm

© 2008 Lenovo

CRT Monitor Characteristics


The vertical refresh rate is also called frame rate, refresh rate, vertical screen frequency, and
vertical scan rate. The vertical refresh rate is a measure of how quickly the electron guns repaint the
screen vertically, from the top to the bottom (measured in hertz, or cycles per second). So at 75 Hz,
that means the screen is painted from top to bottom 75 times in one second; one pass from top to
bottom would be 1/75th of a second. A monitor can handle lower addressabilities at higher refresh
rates.
The horizontal refresh rate (or horizontal scan rate) is the frequency in hertz at which the monitor is
scanned in a horizontal direction. Higher horizontal scan rates produce less flicker.
Flicker is fluctuating brightness. A low vertical refresh rate is the main reason for flicker, although
an interlaced image contributes to the flicker. 72 Hz is the minimum ISO-compliant refresh rate for
a flicker-free image (85 Hz is VESA standard). Flicker is more common on larger displays because
flicker is more noticeable in peripheral vision than in direct vision. Lighter backgrounds have more
noticeable flicker than darker backgrounds.
It is not always a good idea to run a display at the fastest refresh rate possible, because image
quality may be reduced. To place a black and a white pixel next to each other, the electron beams
in the monitor quickly go from off to on. If there is not enough time for the change, the transition
between the two pixels goes gray. Increasing the refresh rate gives the electron beams less time to
switch on and off between pixels. The best approach is to choose the lowest rate at which flicker
can not be seen.

PC Architecture (TXW102) September 2008 63


Topic 6 - Graphics Architecture

Vertical Refresh Rate So Horizontal Scan


Resolution
Should Be Rate Needs to Be
640x480 85 Hz 43 Hz
800x600 85 Hz 54 kHz
1024x768 85 Hz 69 kHz
1280x1024 75 Hz 80 kHz
1600x1200 72 Hz 89 kHz

In noninterlaced (or progressive scan) displays, each image is scanned onto the front of the CRT of
a monitor in a single pass by three electron guns that sweep line by line across the screen in
horizontal strokes. In standard television sets or interlaced monitors, a complete image requires two
scans. Most computer users should consider only noninterlaced displays, because people sit closer
to computer monitors than to televisions, from which they typically sit several feet away looking
primarily at large moving images; most of what is seen on the computer monitor is detailed, static
images like word processing text and numbers.
Scanners and monitors are RGB devices, meaning that they define all colors as mixtures of red,
green, and blue.
Due to the earth’s magnetic field, CRT monitors are manufactured to work in the northern,
southern, and equatorial regions of the earth and may not produce a satisfactory image when moved
between them. The magnetic field does not affect Flat Panel LCD Monitors and ThinkPad LCD
displays.

Monitor Definitions
Black level is the amount of brightness retained when the video signal is set for complete blackness
or zero.
Color temperature is the color tint of the white screen of a monitor (such as red or blue). It is often
measured in 6500ºK or 9300ºK.
Convergence is a measure of how accurately aligned the red, green, and blue electron guns of a
monitor are. Misaligned guns result in misconvergence and unwanted color halos around object
edges.
Degauss is demagnetization of a monitor to reduce picture distortion.
Diagonal pitch or dot pitch is the diagonal or adjacent spacing of same color phosphors, i.e., the
distance between adjacent holes in shadow mask, stripes in aperture grill, ellipses in slot mask, or
the distance between red phosphor dots. .28mm is a common dot pitch. Dot pitch is not the same as
a pixel size. Dot pitch is a physical characteristic of the display and cannot be changed.
Gray-scale shift is a measure of the change in the brightness level of gray areas on screen when
adjacent areas alternate between dim and bright states.
Horizontal dot pitch is the horizontal spacing of same-color phosphors. This term became popular
in 1998 as a way to compare different tube technologies. Previously, dot pitch referred to diagonal
dot pitch. This is a physical characteristic of a monitor and can not be changed. .24mm is a
common horizontal dot pitch.

PC Architecture (TXW102) September 2008 64


Topic 6 - Graphics Architecture

Interlaced controllers draw images with two passes, scanning every other line on the first pass (1, 3,
5, etc.) and filling in the rest on the second pass (2, 4, 6, etc.). Interlaced is sometimes quoted as
43.5 or 87 Hz. The 43.5 Hz is for both fields (passes) to make a complete frame, and the 87 Hz is
for one field to do so. Interlacing is not used in new displays, but these new displays are interlace-
compatible for use with old graphics controllers.

Interlaced Raster Diagram

1
4
2
Second scan 5 First scan
3
6

Luminance is a measure of the brightness of the monitor and is measured in foot-Lamberts (fL).
Luminance allows a monitor to produce a brighter image without bringing the black areas to gray
and thereby lowering the sharpness and contrast.
A moiré pattern is a wavy pattern resulting from a mismatch between the pattern of a shadow mask
or aperture grill and the horizontal line pattern of the image.
Noninterlaced controllers draw all lines in one pass. Noninterlacing graphics controllers and
displays are more expensive than interlaced. Most displays today are noninterlaced.

Noninterlaced Raster Diagram

Pincushion is the curvature of a straight line on the screen. Lines at the edges of the screen tend to
exhibit more pincushioning than lines at the center because of the increased deflection of the
electron beam.

PC Architecture (TXW102) September 2008 65


Topic 6 - Graphics Architecture

Pixel size is the exact number of phosphor dots composing the pixel; this size will vary with the
resolution and addressability (640x480, 800x600, and 1024x768). Pixel size is also called pel size.
Note: These pixel sizes are for 14-inch, 0.28mm dot pitch

Pixel size for SVGA (1024×768)


Pixel size for VGA (640×480)

0.28mm

Example:
z In VGA (640×480), each pixel is composed of more than two triads (sets of phosphor dots).

z In SVGA (1024×768), each pixel is made up of only one triad.

PC Architecture (TXW102) September 2008 66


Topic 6 - Graphics Architecture

Screen regulation is the stability of the dimensions of the display area of a monitor. The screen
image on a poor quality monitor might be bowed or irregular in size or shape.
The shadow mask is a screen just inside the front glass of the display. It is drilled with small holes;
each corresponds to one triad. (It looks sort of like a window screen, or gauze material.) Its purpose
is to guide electron beams so that each beam only hits one phosphor dot in the triad.

Shadow mask

A triad refers to one of the thousands of triangles that are arranged on a computer screen in order to
produce an image. Color monitors display a combination of red, green, and blue. Each triad is
composed of three phosphor dots – one red, one green, and one blue. One electron gun is dedicated
to each of the three colors, so there are three guns in total. The beams from the electron gun move
together. As the beams scan the screen, each triad produces a single color as each gun controls the
intensity of the red, green, and blue.

Electron guns Shadow Inside of


mask faceplate

Electron beams

Red

Blue
phosphor

Green Green
phosphor

Red
phosphor
Blue

Uniformity is a measure of the variance of luminance and color across the display.
Video bandwidth is the amount of data dedicated to projecting an image to a monitor. The video
bandwidth is allocated in horizontal and vertical scan rates.

PC Architecture (TXW102) September 2008 67


Topic 6 - Graphics Architecture

Presets
Presets are a bank of memory locations with stored data (for example, picture size, position, shape)
corresponding to the most commonly used modes (addressability and refresh rates). Presets make the
geometry (picture size, position, shape) appear proportionally on the screen with a border.
In 2000, factory loads started to be used on monitors. Factory loads (also called preloads) are not
checked on all monitors, but the information the monitor needs is extrapolated from the preset
information. These have a higher tolerance on size, etc., but should produce a reasonable image. In
contrast, presets are inspected on all units in production and have a tighter tolerance for
size/position/geometry.
The microprocessor of the display detects the signal from the graphics controller and adjusts the
geometry according to the data it finds in the corresponding memory locations (presets).
A number of memory locations are available for users to store geometry settings according to their
preference (for example, a borderless geometry).
The microprocessor always checks the user memory locations first for geometry preference; if none
exists, it checks for presets. If neither exists, the geometry may appear nonproportional (odd shaped)
and need adjustment. A specific resolution and refresh rate (e.g., 800 by 600 by 256 colors at 72 Hz)
can have only one geometry saved by the user.

VESA DDC
VESA established a standard called Display Data Channel (DDC) in late 1994. This standard is also
called the Display Data Channel Command Interface (DDC/CI). The DDC specification defines how a
monitor and a system will communicate refresh rates at different resolutions, power conservation
capabilities, model number, serial number, vendor, preset modes, and other capabilities.
For the benefits of this automatic identification to be effective, the graphics hardware, BIOS, and
operating system of the system must be enabled for DDC. There are two main standards: DDC1 and
DDC2B. Most monitors and systems support DDC2B today. (DDC1 is no longer used in current
systems.)

DDC1

Continuous data stream from monitor to system

DDC2B

Data from monitor to system on request

PC Architecture (TXW102) September 2008 68


Topic 6 - Graphics Architecture

If a DDC monitor plugs into a DDC system, the system will automatically configure the display to
a high resolution and refresh rate rather than the current but outdated VGA standard. If a monitor is
not automatically detected, make certain that the following two files, which typically ship with a
monitor, are installed on the disk: *.INF (Windows Information for plug and play) and *.ICM (ICC
profile for color calibration).
The original DDC1 specification was later expanded to include two-way communication between
the monitor and graphics adapter using two pins on a standard VGA cable. This specification is
called DDC2B, and it has the added benefits of being much faster than the DDC1 specification and
of allowing the operating system to query the monitor (via the graphics adapter) to find out what
that monitors features are. A compatible graphics controller allows one to switch addressabilities
on the fly without rebooting. DDC2B allows the monitor to always support the highest refresh rate
that both the monitor and graphics adapter can support.
All major operating systems support DDC2-enabled monitors when used with DDC2-compliant
systems.
The latest version of the specification is "Display Data Channel Command Interface (DDC/CI)
Standard-Version 1.1," October 2004 (VESA document VESA-2004-10); see www.vesa.org for
more information.

PC Architecture (TXW102) September 2008 69


Topic 6 - Graphics Architecture

Monitor Features:
CRT Monitor Tube

Four types of CRT tube technologies


• Conventional: the glass tube is a sphere.
• FST (flatter, squarer tube)
- Used in E54, E74, E74M monitors Conventional
• Aperture grill
- Trinitron by Sony (flat vertically)
- FD Trinitron (flat display vertically and horizontally)
- Used in ThinkVision C220p
Flatter, Squarer Tube
• Flat shadow mask
- Flat monitor with shadow mask
- Used in ThinkVision C170 and C190

FD Trinitron
and
Flat Shadow Mask

© 2008 Lenovo

CRT Monitor Tube


There are several types of CRT tube technologies as follows:
Conventional
• The glass tube is a sphere
• Uses a shadow mask
• Rarely used in current monitors
FST (flatter, squarer tube)
Conventional CRT
• Radius of curvature is larger, usually twice that of Monitor
conventional tubes, but the glass tube is still spherical, not flat
• Uses a shadow mask
• Considered better looking but more expensive because of
problems with geometry, doming, and convergence
• Produces less glare, and images at the edges are easier to view
• Preferred for 15-inch or larger displays
• Used by vendors in many monitors

Flatter, Squarer Tube

PC Architecture (TXW102) September 2008 70


Topic 6 - Graphics Architecture

Aperture grill
• Trademarked Trinitron and developed by Sony Corporation.
• Cylinder is completely flat vertically and only slightly curved
horizontally.
• Uses tensioned vertical wires, called an aperture grill,
instead of a shadow mask.
• One or two horizontal wires keep vertical wires in position
and may be visible as thin lines across a light colored screen. Trinitron/Diamondtron
• Phosphor is laid in stripes (versus triad), so it is measured in
stripe pitch instead of dot pitch, which is used on most displays. Stripe pitch
• Used today in many monitors.
• In 1999, FD Trinitron was introduced (FD stands for flat display)
has a flat screen vertically and horizontally, eliminating distortion
of shapes and rendering everything accurately.
• Mitsubishi uses a similar technology named Diamondtron.
• In 1998, Mitsubishi introduced the Diamondtron NF (natural flat) R G B R
with a flat screen vertically and horizontally. Aperture Grill Mask
Flat Shadow Mask
• In 2000, Samsung introduced the Full Flat Shadow Mask.
• Uses a shadow mask.
• Flat in horizontal and vertical axis.
• Used in the ThinkVision C170 and C190.

Other CRT Monitor Tubes and Technologies Flat Display (FD) Trinitron
CromaClear Flat Shadow Mask

• Introduced in 1997 by NEC.


• Tube uses an elliptical-shaped electron stream, slot mask, and phosphor dot pattern, which NEC
claims provides a ". . . crisper, sharper, more lifelike image."
• Targeted specifically at the Trinitron but is as yet unproven.
PureFlat
• Introduced in 1997 by Panasonic.
• Tube is flat in the vertical and horizontal dimensions.
• CRT makers Mitsubishi and Sony have announced their own versions.

PC Architecture (TXW102) September 2008 71


Topic 6 - Graphics Architecture

In 1998, monitors implementing wide deflection yoke, which use a 100-degree deflection of the
electron beam, appeared. A typical CRT uses 90 degrees from one side to the other and result in a
monitor case that is as deep as the diagonal screen measurement. A wide deflection yoke will save
about two inches of depth, so that a 17-inch monitor will have a similar footprint to a 15-inch
monitor, and a 19-inch monitor will fit a 17-inch monitor footprint.
In 1998, short-neck tube monitors that use smaller components at the electron-gun end of the
picture tube were introduced. This reduces the depth of the CRT by about an inch (slightly less than
that of a wide-deflection yoke design). A short-neck tube is not a wide deflection yoke.

Types of Masks
There are three types of masks used by monitors.
A shadow mask (dot-trio shadow mask) is a thin sheet of metal perforated
with holes that align with the phosphor dots on the front of the tube.
The spacing between the phosphor dots varies with the grade of the monitor.
It delivers clean edges and sharp diagonals; these factors are important for text.
An aperture grill, used by the Sony Trinitron and Mitsubishi Diamondtron,
uses an array of thinly stretched wires with phosphor stripes to create the Shadow Mask
screen image. The aperture grille typically has a stripe pitch of .24mm to
.28mm. It is the best choice for image and graphics work, because poorer
horizontal definition makes it less suited for text. A .24mm stripe pitch
(horizontal dot pitch) is equivalent to a .28 dot pitch. Aperture grill tubes
generally deliver richer, more saturated colors, so they are optimized for
image editing and gaming.
A slot mask, introduced by NEC in 1996 under the name CromaClear, Aperture Grill
uses elongated phosphor ovals rather than dots, delivering a crisper
image than other designs, as NEC claims. Some monitor makers
(for example, NEC) have tried to bridge the gap between shadow masks
and aperture grills by offering slot mask designs. A slot mask is
optimized for text and uses an elongated 0.25mm mask opening instead
of dots, which does not require damping wires.

LCD Panel Slot Mask

An LCD panel, in which a thin layer of material transmits or blocks


light, is used in notebooks and flat panel monitors. Each pixel can
be turned on and off as required. It has no flicker and few emissions.
LCDs are direct-address displays, which means that each pixel of the
image is defined and displayed by an individual physical component
in the monitor. Three liquid crystal subpixels–one each for red, green,
and blue–are controlled precisely to produce the desired color shade
for that single dot on the screen. The result is a noticeably sharper
image (especially with text) than a CRT can produce. LCD Panel

PC Architecture (TXW102) September 2008 72


Topic 6 - Graphics Architecture

Monitor Features:
Flat Panel Monitors Overview

• Flat panel color monitors and notebooks


use LCD panels (digital technology, not
analog like CRT)
• Advantages over CRT monitors:
- Smaller size and footprint
- Low emissions and little flicker
Top View Side View
- Completely flat (reduces glare)
- Do not attract dust
- Less heat
- Sharper image for text
• Disadvantage over CRT monitors
- Worse graphics and image rendering
- Image quality for motion (video or games)

© 2008 Lenovo

Flat Panel Monitors Overview


LCD panels, used in flat panel color monitors and notebooks (e.g., ThinkPad systems), offer space
savings (a footprint over 60 percent smaller), great picture quality, increased energy efficiency
(uses 75 to 80 percent less energy and gives off a third of the heat), and virtually no
electromagnetic emission. Flat panel monitors have completely flat screens, sharply reduce
reflection from the user's environment, and practically eliminate glare. LCD panels have a sharper
image (especially for text) than a CRT monitor.
LCD panels significantly enhance space savings as they provide a more functional desktop
workspace. They allow a larger screen without a user having to buy a larger desk. They also allow
versatile mounting on stands, wall brackets, or arm assemblies.

Flat Panel Monitors provide significant space savings

Medical Office Cubicles Financial/Trading Environments

PC Architecture (TXW102) September 2008 73


Topic 6 - Graphics Architecture

Flat Panel Monitors provide significant space savings.

CRT Flat Panel

Flat Panel Monitors allow various adjustments.

Landscape Pivot Portrait

Ergonomic Stand
• Tilt
• Swivel
• Height Adjustment

PC Architecture (TXW102) September 2008 74


Topic 6 - Graphics Architecture

Multiple User Environment Flat Panel Monitors on adjustable arm


A CRT monitor is a better choice for people working with graphics or who need reliable image
rendering because the liquid crystal cells of an LCD do not show consistent brightness and color
shades as the viewing angle shifts. Motion is not as clear on an LCD as on a CRT which impacts
those who watch video or play action games. In an LCD, images are produced by the physical
movement of liquid crystal molecules; it takes time for them to move compared to the nearly
instantaneous response of a CRT’s electron beams and associated phosphors. So if you need a
display for high-end graphics processing or work with video, a CRT may be a much better
purchase. Many LCD monitors have viewing-angle limitations that can cause the brightness or even
the hue of a color to change as your point of view changes. This can make a significant difference
with larger displays, because the viewing angles to the corners can be considerable. And even if the
panel has excellent viewing-angle performance, the color gamut is typically not a match for a
quality CRT's, making predicting what the final product will look like more difficult.
Unlike the CRT, the LCD flat panel monitors have a viewable size that is generally the same as its
quoted size, so a 15-inch LCD gives roughly the same viewable area as a 17-inch CRT.
A CRT’s volume typically increases by a factor of eight when the horizontal screen size doubles.
LCD panels are typically one half inch to two inches thick and use a thin layer of material that
transmits or blocks light.
Unlike CRTs, flat panels are built to provide a specific display resolution. The number of pixels in a
particular panel are set when the panel is designed. This differs from a CRT, in which the number
of pixels being displayed is defined by the signal used to drive the display (as long as this signal is
within the bandwidth supported by the analog circuitry of the CRT).
There are techniques that allow a flat panel product to be driven using signals with different
resolutions (addressabilities). The simplest versions use only a portion of the available resolution
and leave a border around the picture. The more sophisticated solutions mathematically expand the
picture to use all of the available resolution. Both cases represent a compromise to the user.
Flat panel monitors are similar to aperture grille CRTs in that they have vertical stripes of color,
which are typically spaced in the range of 0.24 mm to 0.30 mm. One word of caution: although
numerically similar, CRT and flat panel stripe-pitch numbers should not be directly compared,
because of other differences in the technologies.
When buying a monitor, it is important to keep some perspective. Even at smaller pixel sizes, the
images do not come close to the resolution of an inexpensive printer. A 0.25-mm pixel pitch of a
19-inch UXGA panel is still only slightly more than 100 pixels per inch, while an inexpensive dot
matrix printer can produce up to 1,000 dots per inch. So the image on the screen still won't look as
good as it will on paper.

PC Architecture (TXW102) September 2008 75


Topic 6 - Graphics Architecture

Contrast ratio
Contrast ratio is the ratio between the brightest white and darkest black. The higher the contrast
ratio, the deeper and richer the coloring.

400:1 Contrast Ratio

1000:1 Contrast Ratio

Pixel response time


The pixel response time measures in milliseconds the length of time it takes for a pixel to go
from black to white. The lower the number the better, because monitors with faster pixel
response times are generally better at handling fast-motion graphics. A slower pixel response
time tends to produce ghosting or blurring around the edge of moving images.

25ms – noticeable blur 16ms – slight blur 8ms – no blur

PC Architecture (TXW102) September 2008 76


Topic 6 - Graphics Architecture

Brightness
Brightness is a measurement of light intensity produced by an LCD’s backlight measured in nits
(lumens) or candelas per meter square (cd/m2). Higher brightness produces better images
(greater readability) under high ambient light. Flat panel monitors typically average about 250
nits; CRTs typically have an average of 100 nits.

200:1 cd/m2: 250:1 cd/m2 300:1 cd/m2


Minimal Standard Optimal

Glossy vs. Anti-Glare Panels

Glossy Anti-Glare

Physical description: Physical description:


• The outer surface of the LCD is smooth with • The outer surface of the LCD is intentionally
mirror-like surfaces roughened to diffuse reflections
• Allows light to pass through the display with
minimal diffusion
• Shows fingerprints over screen

Benefits include: Benefits include:


• Sharper contrast • Best for deterring visual fatigue and distraction
• Create better image visibility in bright light created by reflections on the display surface
(including sunlight) • Reduced intensity of glare from surrounding
• Allows lower backlighting for similar perceived light sources (light fixtures, windows, etc.)
brightness and color depth. • Eliminates clearly defined reflections that
• Crisper images appear super-imposed over the image from
the LCD
• Increased color depth
• Wider viewing angles

Usage cases and environments: Usage cases and environments:


• Intended to optimize the multimedia • Intended to minimize reflections during long-
experience (especially movies) term use of the display
• Ideal for viewing high-end graphic and video • Ideal choice for users who want to optimize for
content in lowly lit environments typical office work in environments with
standard ambient lighting
• For typical office work, ergonomists will
recommend this display type due to the visual
ergonomic benefits

PC Architecture (TXW102) September 2008 77


Topic 6 - Graphics Architecture

Monitor Features:
Color Gamut

• Gamut is the entire range of possible visible colors


• Multiple standards exist to standardize the color
spectrum
• Standard RGB (sRGB) is the Windows and
Internet standard color spectrum Color Universe
The entire range of possible
• Adobe RGB (aRGB) is an enhanced standard chromaticities

used in high-end digital content segments


• Different displays are capable of displaying
portions of the spectrum
• Displays are rated at not only brightness, but also
what percent of the aRGB standard they cover Standard RGB vs. Adobe RGB
Relative sizes of the sRGB and aRGB
• Typical notebook LCD displays are around 45% of colorspaces within the color universe

the aRGB gamut


• The digital content creation segment requires high
gamut monitors, as they work with true life 2D and
3D images in both still frame and motion video
A typical display gamut
The colored triangle is the gamut
available on a typical monitor; it does
not cover the entire space

© 2008 Lenovo

Gamut
Color gamut is the range of colors a monitor can display. The higher the percentage, the better.
For example, the Lenovo ThinkVision L193p has a 72% color gamut, and the Lenovo ThinkVision
L220x Wide has a 92% color gamut. The ThinkPad W700 has a 72% color gamut display.

PC Architecture (TXW102) September 2008 78


Topic 6 - Graphics Architecture

Monitor Features:
Color Calibration

• Digital content segment demands high quality displays


that must be calibrated to be correct
• The ThinkPad W700 is the industry's first integrated color
calibration notebook
• The ThinkPad W700 uses the X-Rite Pantone color
sensor and software, delivering a consistent and true
display

Calibration cycles are performed with the lid closed


User sees before/after effect when complete for comparison
Calibration is usually performed 1-4 times per month

© 2008 Lenovo

Color Calibration
The aim of color calibration is to adjust the colors of one output device to match that of another.
The device that is to be calibrated is commonly known as calibration source; the device that serves
as a comparison standard is commonly known as calibration target.

PC Architecture (TXW102) September 2008 79


Topic 6 - Graphics Architecture

Monitor Features:
CCFL vs LED Backlight

• Compact fluorescent (CCFL) backlight


common in flat panel monitors and notebooks
• LED backlight introduced in 2008
in notebook screens
• LED advantages
- Lower power CCFL LED
- Brighter colors Backlight Backlight

- Excellent contrast
- Thinner screens

LED backlight used in Lenovo


ThinkPad X300 notebook
© 2008 Lenovo

CCFL vs LED Backlight


Flat panel monitors and notebooks have traditionally used a cold cathode fluorescent lamp (CCFL)
as the backlight (or backlight unit) for LCD panels.
In 2008, some products started using an light-emitting diodes (LED) backlight in LCD panels. LED
backlights are more expensive than CCFL backlights, but offer some advantages:
• Lower power
• Brighter colors
• Excellent contrast (better control of the black level displayed)
• Thinner screens
LED screens are thinner because the light-emitting diodes are like strings wired along the edges of
the screen or as a matrix in the middle; CCFLs use thicker glass tubes. Also, a CCFL's inverter
which powers the tubes is replaced by an LED's smaller driver board.

PC Architecture (TXW102) September 2008 80


Topic 6 - Graphics Architecture

Monitor Features:
Widescreen Monitors and Notebooks

• Widescreen
- 9% wider screen
- 16:9 or 16:10 aspect ratio
• Advantages
- See more columns of spreadsheet/database
- Allows smaller windows on the side
- Most movies widescreen format Lenovo ThinkVision L220x Wide
• Disadvantages
- Text easier to read and scroll vertically
- Notebooks hard to fit in standard carrying cases
• Lenovo offers widescreen monitors and notebooks

Lenovo ThinkPad T61 Widesceen

© 2008 Lenovo

Widescreen Monitors and Notebooks


Widescreen monitors and notebooks with a 16:9 or 16:10 aspect ratio have about a 9% wider
screen than the standard 4:3 aspect ratio notebooks, making them better for landscape applications
such as watching DVDs, wide spreadsheets, or showing multi-tasked application windows side by
side. However, wide aspect also reduces the height of the display by 9%. This reduction means that
users with primarily portrait applications such as e-mail, Internet browsing and word processing
will have 9% less area top to bottom so they will have to scroll more often. Many traveling workers
like widescreen notebooks because lower height makes notebooks easier to carry in a briefcase and
less likely to be damaged by a reclining airline passenger seat.
Widescreen displays are used in select Lenovo ThinkPad and 3000 Family notebooks. Widescreen
monitors include the Lenovo L192 Wide and Lenovo D221 Wide.

PC Architecture (TXW102) September 2008 81


Topic 6 - Graphics Architecture

15.4 Wide" (Width:331mm / Height:207mm)

15.0" (Width:305mm / Height:228mm)

14.1" (Width:285mm / Height:214mm)

Origin

Standard vs. Widescreen Notebook Comparison

CONTENT INCREASE

A 14.1" widescreen offers 25% more data space and four


more columns than a 14.1” non-widescreen

PC Architecture (TXW102) September 2008 82


Topic 6 - Graphics Architecture

Widescreen Advantages
Following are the advantages of a widescreen monitors and notebooks:
• Users can see two windows side by side which offers productivity advantages.
• Spreadsheet and database users can see more column data at a time which reduces horizontal
scrolling.
• Many movies are produced in widescreen format so viewing movies utilizes more screen.
• The height of the screen is reduced which is advantageous in the tight area of a plane or train
so that the seat in front of you does not bump into a notebook.
• Various chat windows or sidebar applications can be placed on the side of a window instead
of directly on top of the main applications.

Widescreen Disadvantages
Following are the disadvantages of a widescreen monitors and notebooks:
• The taller screen of a standard monitor or notebook requires less vertical scrolling.
• Reading text is easier on the eyes when it is more vertically-oriented, as in a standard format.
• Standard screen notebooks fit into a backpack or carrying case better because the space is
more square
• Some Web pages or applications are fixed width so are optimized for standard screen sizes.
• Some older games are distorted or may not fill the full screen of a widescreen.

PC Architecture (TXW102) September 2008 83


Topic 6 - Graphics Architecture

• The view heights of 22" wide and 19" standard flat panel monitors are similar
• View area: 22" wide view area is a little larger than 19" standard flat panel monitor
• Similar heights between 19” standard and 22” wide

Wide
std

std Wide

Due to shorter system depth and display height, a 14.1" widescreen system offers an increased
chance of having better display angle and palm rest position when used in airplane seating
because the widescreen provides 1.5" (3.8 cm) more space.

14.1 14.1" widescreen


"
Space in front of palm a + 1.5"
a
rest with a preferred
display angle.

14.1" 14.1" widescreen


display
Display angle with the display
angled
same acceptable straight
down
palm rest position.

PC Architecture (TXW102) September 2008 84


Topic 6 - Graphics Architecture

CRT and TFT Flat Panel Monitor Features:


Lenovo Monitors Meet International Standards

Lenovo monitors comply with international standards:


• ENERGY STAR (power management)
• EPEAT, WEEE, RoHS (environment)
• MPR-II, MPR-3, TCO'95, TCO'99, TCO'03 (emissions)
• ISO 9241 Part 3 and 8 (ergonomics)
• Many other standards

Lenovo ThinkVision L193p

© 2008 Lenovo

Lenovo Monitors Meet International Standards


There are many international standards for monitors. Most Lenovo monitors comply with all major
international standards, such as the following: ISO 9241 parts 3 and 8 (front of screen quality and
ergonomics requirements); SWEDAC MPR-II, MPR-3, TCO-92, TCO-95 (health, electromagnetic
radiation/emissions); ENERGY STAR (power management, environmental); EPEAT, WEEE,
RoHS (environmental); and ISO 9000/BS5750 (manufacturing quality).
Lenovo monitors use recyclable plastics and non-phenolic boards. Visit lenovo.com/accessories to
see information on Lenovo monitors.

PC Architecture (TXW102) September 2008 85


Topic 6 - Graphics Architecture

MPR-II
MPR-II is a guideline developed by the Swedish Board for Technical Accreditation (SWEDAC) to
limit the electromagnetic emissions and electrostatic fields generated by workstations.
Electric and magnetic fields with frequencies between 5 Hz and 2 kHz are called extremely low
frequency (ELF) fields. Those with frequencies between 2 kHz and 400 kHz are called very low
frequency (VLF) fields. The Swedish MPR-II guidelines intend to limit the ELF. Sources of these
fields include the main power supply (50 or 60 Hz) from the wall socket and the circuits responsible
for sweeping the electron beam across the face of the CRT.
High voltages used within CRTs induce an electrostatic field (ESF) on the surface of the monitor
screen. The MPR-II guideline requires that this charge be minimized.

MPR-3
The draft for MPR-3 was ratified as a formal Swedish standard on November 30, 1995. MPR-3
deals with the emission levels (electrical and magnetic) of displays. The earlier MPR-I and II
documents were voluntary guidelines but were widely respected in the scientific community.
Key elements of MPR-3 include the following:
• MPR-3 was designed for all types of visual displays, not just standard CRT monitors. This is a
significant element, because not all VDTs today use CRT technology. Some VDTs use liquid
crystal display (LCD) technology. Others use electroluminescent or plasma technologies. MPR-II
was restricted to CRT technology.
• MPR-3 was expanded to contain three separate emissions categories, a slightly simplified
protocol for laboratory measurements, guidance on the assessment of measurement uncertainty,
and directions for workplace surveys.
• MPR-3 incorporates elements from guidelines developed in the US, Europe, and Japan and
represents broad participation by many groups, including Swedish government agencies, labor
unions, and large computer manufacturers.
MPR-3A is a colloquial name for Swedish Standard SS43614 90:1995.

Emissions
All electronically powered equipment emits electrical and/or magnetic fields. There is no evidence
that monitor emissions are a health risk. The following guidelines exist for monitors:
MPR-I
• Issued 1984
• MPR-I addresses:
– ESF (electrostatic fields)
– VLMF (very low magnetic fields)

PC Architecture (TXW102) September 2008 86


Topic 6 - Graphics Architecture

MPR-II
• More restrictive = MPR-I +
• MPR-II addresses:
– ELMF (extremely low magnetic fields)
– ELEF (extremely low electric fields)

MPR-3A
• Equivalent to TCO'95

TCO'92
• Swedish Confederation of Professional Employees Guideline
• Uses a different measurement methodology than MPR-II and made compliance much stricter
with lower emissions than MPR-II

TCO'95
• Incorporated all the guidelines in TCO'92
• Ensured that the monitor incorporates ecological and environmental benefits
• Mandated that the monitor and its packaging be composed of recyclable materials
• A TCO'95-compliant monitor is Energy Star-compliant

TCO'99
• Was released in late 1998 and incorporated all the guidelines in TCO'95
• Tightened picture quality and controls, visual ergonomics and electromagnetic emissions, and
added alternative keyboard design guidelines
• Required higher image refresh rates to eliminate entirely perceived flickering
• Called for massive reductions in magnetic and electric fields, reduced heat emission to keep
humidity levels constant for the user, and reduced energy consumption
• Included various environmental improvements, such as reduced cadmium and bromide pollution

PC Architecture (TXW102) September 2008 87


Topic 6 - Graphics Architecture

TCO'03
TCO'03 was released in late 2002 and incorporated all the guidelines in TCO'99. It tightened the
requirements in the area of visual ergonomics. See www.tcodevelopment.com for more
information.

ISO 9241 Part 3 and Part 8


ISO 9241 compliance ensures that ergonomic requirements are met and superior image quality is
provided when the monitor forms part of an ISO-compliant system. ISO 9241 parts 3 and 8 are
parts of a standard developed by the International Organization for Standardization (ISO) to address
the following aspects of the operation of an LCD monitor:
• Character size, spacing, and shape
• Character clarity (the contrast between lit and unlit pixels)
• Screen linearity and squareness
• Screen image stability (flicker and jitter)
ISO-capable elements do not necessarily make an ISO-compliant system when put together. To
satisfy the requirements of ISO 9241, a complete platform must be tested and must comply with all
applicable elements of ISO 9241, parts 3 and 8. The platform includes:
• System unit
• Operating system
• Fonts
• Video subsystem
• Monitor
• Keyboard
• Mouse

PC Architecture (TXW102) September 2008 88


Topic 6 - Graphics Architecture

ENERGY STAR
In 1992 the US Environmental Protection Agency (EPA) evolved its voluntary program, called
ENERGY STAR, to cover computers. The ENERGY STAR program for computers has the goal of
generating awareness of energy saving capabilities, as well as differentiating the market for more
energy-efficient computers and accelerating the market penetration of more energy-efficient
technologies. On July 20, 2007, the EPA updated the ENERGY STAR computer specification to
Version 4.0.

VESA Display Power Management Signaling (DPMS)

VESA Power Sync


Display Status Power
State Light Lines
On Normal On Full H=on,
V=on
Standby Blank screen Fast blink Full H=off,
(instant restart) change color V=on
Suspend Video+scans off Slow blink Energy Star H=on,
(instant restart) change color <30 watts V=off
Off Only micro on Off <8 watts H=off,
(delayed restart) V=off

Power management in most Lenovo monitors follows the VESA DPMS (Display Power
Management Signaling) standard, shutting off circuitry in stages after a defined period of system
inactivity. The stages are controlled by the horizontal and vertical sync lines on the incoming video
signal, as shown on the chart above.
Most Lenovo monitors utilize the VESA DPMS protocol software to execute the power saving
stages. The power management is activated by a specific software utility that must be resident in
the system that is driving the display. This software must support the VESA DPMS hardware
interface to the display and must be specifically written for each graphics controller. Most Lenovo
monitors have the necessary power-saving circuitry built into them as standard features.

PC Architecture (TXW102) September 2008 89


Topic 6 - Graphics Architecture

CRT and TFT Flat Panel Monitor Features:


Lenovo Monitors

CRT
• Lenovo E75

TFT Flat Panel


• ThinkVision L151
• ThinkVision L171p
• ThinkVision L174 ThinkVision L174 ThinkVision L190x
• ThinkVision L190x
• ThinkVision L193p
• ThinkVision L197 Wide
• ThinkVision L200p Wide
• ThinkVision L220x Wide

ThinkVision L197 Wide ThinkVision L220x Wide


© 2008 Lenovo

Lenovo Monitors
Lenovo markets both CRT and flat panel TFT monitors.
Current Lenovo CRT monitors:
• Lenovo E75
Current Lenovo flat panel TFT monitors:
• ThinkVision L151
• ThinkVision L171p
• ThinkVision L174
• ThinkVision L190x
• ThinkVision L193p
• ThinkVision L197 Wide
• ThinkVision L200p Wide
• ThinkVision L220x Wide ThinkVision L220x Wide

PC Architecture (TXW102) September 2008 90


Topic 6 - Graphics Architecture

Lenovo ThinkVision L151 Lenovo ThinkVision L171p with rotating


screen (screen pivots 90 degrees from
landscape to portrait)

Lenovo ThinkVision L193p

ThinkVision USB Soundbar


(attaches to base of supported
ThinkVision monitors)

PC Architecture (TXW102) September 2008 91


Topic 6 - Graphics Architecture

Analog versus Digital Interfaces

PC
Analog signal
Analog CRT
15-pin D-sub

Graphics Digital-to-analog
controller converter (DAC)

Analog LCD
Analog signal
15-pin D-sub Some image
quality lost

Digital-to-analog Analog-to-digital
converter (DAC) converter (ADC)

Digital signal Digital LCD


24-pin DVI-D Best image

Some LCD monitors have


two connectors for either
analog or digital signal.
© 2008 Lenovo

Analog versus Digital Interfaces


All signals are digital in a PC. However, CRT monitors accept only analog signals, so the signal
must be converted from digital to analog before it can be displayed. Some LCD monitors can
accept only digital signals, while other LCDs accept either digital or analog (in which case it must
convert the signal to digital). When a PC sends images to a monitor, the images are first calculated
by the processor and graphics accelerator and are stored in video buffer memory.

Cathode Ray Tube (CRT) Monitors


For legacy graphics controllers with a VGA/SVGA 15-pin D-sub connector, the image data is sent
through the DAC, which converts the digital signal into a series of analog waves. Five wires send
the waves to a CRT monitor (one wire each for red, green, and blue, plus horizontal and vertical
synchronization signals).
CRTs are effective in using the wave forms to modify the electron gun beams of the monitor, in
that they can increase and decrease the intensity of the beam to create brighter or dimmer pixels.
Some information is lost in the translation from digital to analog signals, but the difference is not
noticeable with the limitations of human visual perception and the ability of the CRT to reproduce
subtle signal differences.

PC Architecture (TXW102) September 2008 92


Topic 6 - Graphics Architecture

Liquid Crystal Display (LCD) Monitors


LCDs utilize a digital signal internally, because the signal is not analog like that in a CRT. The
pixel calculated by the display driver corresponds exactly to a specific set of liquid crystal cells in
the LCD monitor.
LCD monitors can utilize both incoming analog signals to be compatible with legacy graphics
controllers (an analog LCD monitor) and digital signals (a digital LCD monitor).
With an analog LCD monitor, the manufacturer has to use expensive analog-to-digital conversion
hardware in the monitor. This conversion degrades image quality, because an LCD monitor may
not be able to calculate the precise pixel location. Therefore, some screen images and patterns
might cause unnecessary visual artifacts such as banding or jittery pixels (called pixel jitter or pixel
swim). Most analog LCD monitors have phase and clock controls to help adjust the circuitry of the
monitor in order to process the signal as accurately as possible, but some artifacts can still exist.
With a digital LCD monitor, a digital signal can be sent from a graphics controller directly to the
monitor with no conversions to analog. There is no digital-to-analog conversion in the graphics
subsystem and no analog-to-digital conversion in the monitor. Therefore, an LCD monitor produces
a higher contrast image with sharper text and better image quality. There are no phase or clock
adjustments, and each pixel is displayed in precisely the correct location.
An LCD monitor can have both an analog and digital connector (so it can accept an analog signal
or a digital signal). This capability will add some cost to the LCD monitor, as it needs an Analog-
to-Digital Converter (ADC) and two connectors (a 15-pin D-Shell and a DVI-D/DVI-I). For most
of these monitors, the user can choose between analog or digital by attaching a cable to only one of
the connectors; both connectors can have cables attached, and the active connection is changed by
an On Screen Display menu.

Lenovo ThinkVision L171p monitor


with DVI-I on left and 15-pin D-shell on right

Lenovo ThinkVision L171p monitor


with analog cables attached

PC Architecture (TXW102) September 2008 93


Topic 6 - Graphics Architecture

Monitor Connectors:
DVI Overview

• System and monitor connectors based


on Digital Visual Interface (DVI)
- Single-link: default DVI standard
- Dual-link: doubles bandwidth for
large monitors
• DVI-D (digital only)
• DVI-I (digital and analog [integrated])
• DVI-A (analog only)

DVI-D on DVI-I on DVI-A on


PC PC monitor

• Newer DVI connectors support HDCP to view protected HD media


• DVI could be replaced by HDMI and DisplayPort after 2008
© 2008 Lenovo

DVI Connectors
In April 1999, the Digital Display Working Group (DDWG) released the Digital Visual Interface
(DVI) revision 1.0 specification, which defined new monitor connector types to incorporate digital
flat panel monitors. DVI defines connectors with implementations that allow backward
compatibility with analog CRTs and support for digital flat panel monitors.
The data format used by DVI is based on the PanelLink serial format devised by the semiconductor
manufacturer Silicon Image Inc. This uses Transition Minimized Differential Signaling (TMDS). A
single-link DVI consists of four twisted pairs of wire (red, green, blue, and clock) to transmit 24
bits per pixel. The timing of the signal almost exactly matches that of an analog video signal. The
picture is transmitted line by line with blanking intervals between each line and each frame, and
without packetization. No compression is used, and DVI has no provision for only transmitting
changed parts of the image so the whole frame is constantly retransmitted.
With a single-link DVI, the largest resolution possible at 60 Hz is 2.6 megapixels. The DVI
connector therefore has provision for a second link called dual-link DVI, containing another set of
red, green, and blue twisted pairs. When more bandwidth is required than is possible with a single-
link, the second link is enabled, and alternate pixels may be transmitted on each. Dual-link doubles
the bandwidth of the DVI interface such as needed to drive a 4-megapixel monitor (2560x1600).
The DVI specification mandates a fixed single-link cutoff point of 165 MHz, where all display
modes that require less than this must use single-link mode, and all those that require more must
switch to dual link mode. When both links are in use, the pixel rate on each may exceed 165 MHz.
The second link can also be used when more than 24 bits per pixel is required, in which case it
carries the least significant bits.

PC Architecture (TXW102) September 2008 94


Topic 6 - Graphics Architecture

Like current analog VGA connectors, the DVI connector includes pins for the Display Data
Channel version 2 (DDC2B) that allows the graphics adapter to read the monitor's extended display
identification data (EDID).
DVI pins are not the standard cylindrical pins found on analog VGA connectors; they are flattened
and twisted to create a Low Force Helix (LFH) contact which provides a more reliable and stable
link.
• DVI-D only supports digital signaling
• DVI-A only supports analog signaling
• DVI-I supports both analog and digital signaling
• 15-pin D-shell only supports analog signaling

Two different DB-15 to DVI-I dongles

PCI Express x16 Adapter with three ports


(1) SVGA, (2) S-Video out, (3) DVI-I
X1600 Pro Adapter used in
select ThinkCentre desktops

PC Architecture (TXW102) September 2008 95


Topic 6 - Graphics Architecture

DVI-D (single-link) on PC

DVI-D (dual-link) on PC

DVI-I (single-link) on PC

DVI-I (dual-link) on PC

DVI-A on monitor

PC Architecture (TXW102) September 2008 96


Topic 6 - Graphics Architecture

Two Connectors on C220p Analog 15-pin D-Shell (top)


Analog 15-pin D-Shell (left) and DVI-A Connector (bottom)
DVI-A (right)

DVI-A Connector on Signal Cable DVI-A to 15-pin D-Shell Cable

DVI-D Connector

PC Architecture (TXW102) September 2008 97


Topic 6 - Graphics Architecture

DMS-59 Connector
The DMS-59 connector allows connection of a dual DVI-I cable or dual VGA cable. DMS-59 is used
to carry two digital DVI and/or two analog VGA video signals. A “Y-cable” splitter is needed to
convert DMS-59 interface to (digital) DVI-D or (analog) VGA. A DMS59 video card solution is
designed to address expansion card height restraints that exist in today’s small form factor PCs.

DMS59 Connector
(plug end)

DMS59 to Dual-VGA dongle allows two VGA DMS59 to Dual-DVI dongle allows two DVI
monitors to connect to the graphics adapter monitors to connect to the graphics adapter

PC Architecture (TXW102) September 2008 98


Topic 6 - Graphics Architecture

Monitor Connectors:
DVI Connector Map

VGA DVI-D DVI-I


analog only digital only digital or analog

PC

r d
DVI-D connector ito en nd
at each cable end m on le ee
b l
on nd) ca cab
in e
1 5-p tem a ch ch
h s e a
wit n sy at n e
bl e o
rs s o
Requires r ca VI-A o r
(o D t
dongle le and ec cto
ng nn ne
Do end o
c o n
c
I-D -A
DV VI
D
or

Monitor

Legacy VGA DVI-I DVI-A


15-pin D-sub digital or analog analog

© 2008 Lenovo

Monitor Connector Map


The DVI specifies two identical mechanical characteristics: DVI-D (digital only) and DVI-I (digital
or analog [I stands for integrated]).
A legacy 15-pin D-shell (or D-sub) analog monitor connector connects to a DVI-I connector by a
dongle or adapter between the connectors. A digital-only device cannot be plugged into an analog-
only device, but both will fit into a connector that supports both types of interfaces.
New analog monitors will have a connector that will work only with the DVI-I connector called
DVI-A connectors. The DVI-A interface supports much higher bandwidth, thereby allowing this
connector to handle “higher” resolutions more clearly than the 15-pin D-shell interface.
DVI pins are not the standard cylindrical pins found on analog VGA connectors; they are flattened
and twisted to create a low-force helix (LFH) contact, which provides a more reliable and stable
link.

Dongle (Top View) Dongle (DVI-I end) Dongle (DB-15 end)


[analog DB-15 to analog DVI-I] [analog DB-15 to analog DVI-I] [analog DB-15 to analog DVI-I]

PC Architecture (TXW102) September 2008 99


Topic 6 - Graphics Architecture

Dongle
Dongle
Typical monitor DB-15
(analog DB-15 to analog DVI-I)
(Analog)

Dongles get around mechanical differences; the signaling remains the same.

PC:

DVI-A (analog, new in 1999)

PC:

DVI-I (digital or analog)

Monitor:

Digital DVI-D
(single-link)

PC Architecture (TXW102) September 2008 100


Topic 6 - Graphics Architecture

Monitor Connectors:
High-bandwidth Digital Content Protection (HDCP)

• High-bandwidth Digital Content Protection (HDCP)


• HDCP permits viewing DRM-protected content
• Used by HD DVD and Blu-ray Disc movies
• Both graphics adapters and monitors require HDCP support

© 2008 Lenovo

High-bandwidth Digital Content Protection (HDCP)


In 2006, some DVI ports and monitors became HDCP-compliant (High-bandwidth Digital Content
Protection). HDCP allows viewing of DRM-protected (Digital Rights Management) content that is
common on many movies, especially High Definition (HD) media such as HD DVD and Blu-ray
discs. To play various High Definition movies, a system requires HDCP in both the graphics
adapter and the monitor.

PC Architecture (TXW102) September 2008 101


Topic 6 - Graphics Architecture

Monitor Connectors:
High Definition Multimedia Interface (HDMI)

• HDMI combines video and audio into a single


digital connection
• One cable for HD video and HD audio
• Requires HDCP Digital Rights Management
• Utilized widely in latest consumer devices (HDTVs)
• Starting to appear in PC graphics adapters and PC monitors
• Expected to replace the DVI connector
• Will compete with DisplayPort on higher-end PCs

HDMI Cable HDMI Connector

© 2008 Lenovo

High Definition Multimedia Interface (HDMI)


High Definition Multimedia Interface (HDMI) is a digital audio and video interface for higher-end
consumer electronic devices and PCs. HDMI combines the digital video signal of DVI with up to
eight channels of high-resolution digital audio over a single cable utilizing a small connector.
HDMI allows each channel to carry bidirectional video, audio, multimedia, or device-controlling
signals.
HDMI encodes video into Transition Minimized Differential Signaling (TMDS) for digital
transmission over the cable. An optional Consumer Electronics Control (CEC) signal is also
supported. It supports 480i, 480p, 576i, 576p, 720p, 1080i, 1080p, and 1440p with a bandwidth of
10.2 Gb/s at 340 Mpixels/sec.
For audio, HDMI supports 8-channel uncompressed digital audio at 192 kHz sample rate with 24
bits/sample. It supports compressed audio such as Dolby Digital or DTS. HDMI supports very high
bitrate lossless compressed audio of Dolby TrueHD and DTS-HD Master Audio.
HDMI has gone through several different versions. With version 1.3, HDMI now supports the
potential to have larger, higher-resolution displays, more and higher-resolution colors (up to 48-bit,
compared with today's typical 24-bit color) and more.

PC Architecture (TXW102) September 2008 102


Topic 6 - Graphics Architecture

HDMI 1.0 HDMI 1.1 HDMI 1.2 HDMI 1.3


Released Dec 2002 Released May 2004 Released Aug 2005 Released June 2006
Initial specification Added support for Added support for SACD Increases bandwidth to
DVD Audio Audio 10.2 Gb/s (340 MHz)

Permitted PC applications to Offers support for 16-bit color,


use only RGB color space increased refresh rates (ex.
120 Hz), support for
1440p/WQXGA resolutions

Supported low-voltage (AC- Supports xvYCC color space


coupled sources) in PCs standards

Adds features to automatically


correct audio video
synchronization (lip sync)

Adds mini connector

Adds support for Dolby


TrueHD and DTS-HD Master
Audio standards

HDMI's initial success has come in the world of consumer electronics and digital TVs, where it is
now a de facto standard. HDMI is expected to become more widespread in higher-end PC graphics
adapters and PC monitors. It will primarily replace the DVI connector.
HDMI utilizes the High-bandwidth Digital Content Protection (HDCP) digital rights management
(DRM) specification. This HDCP specification is proprietary, requiring a license and royalty
payments. HDMI in conjunction with HDCP is also a required part of both the Blu-ray and HD-
DVD standards. As a result, any CE device that uses Blu-ray or HD-DVD standards must include
an HDMI connector.
The standard 19-pin Type A HDMI connector is backward-compatible with the single-link DVI to
carry digital video (DVI-I or DVI-D). This allows a DVI source to connect to an HDMI monitor, or
vice versa, with a compatible adapter or dongle, but the audio and remote control features of HDMI
would not operate. Also, without HDCP support in a monitor, the content will not display. A future
29-pin Type B connector will carry an expanded video channel for use with resolutions higher than
WQSXGA (3200x2048). Type B HDMI is backward compatible with dual-link DVI.

HDMI out connector on


hybrid Blu-ray/HD-DVD Player

PC Architecture (TXW102) September 2008 103


Topic 6 - Graphics Architecture

Monitor Connectors:
DisplayPort

• DisplayPort combines video and audio into a


single digital connector
• One cable for HD video and HD audio
• Optional HDCP Digital Rights Management
• Primarily for PC-based products (not
consumer electronic products)
DisplayPort Connector
• Expected to replace the DVI connector
• Works with DVI via a dongle
• Will compete with HDMI on higher-end PCs
• Chipset support started in 2008
• Used in select Lenovo notebooks such as
ThinkPad T500

DisplayPort Connector

© 2008 Lenovo

DisplayPort
DisplayPort is a digital audio and video interface for broad application in PCs, monitors,
televisions, and projectors. It also defines the internal connections between notebook PC graphics
chipsets and their associated LCD screens. DisplayPort is designed to replace DVI and eventually
VGA, making digital display connections easier, more readily available and more functional. The
DisplayPort digital interface was originally created as a cost-free alternative to HDMI.
Currently at version 1.1, DisplayPort is owned by the Video Electronics Standards Association
(VESA). VESA runs a compliance and interoperability program for DisplayPort connectors, cables
and devices. The VESA program ensures functional compatibility between products that carry the
DisplayPort logo. In January 2007, VESA announced that it is developing a DisplayPort
Interoperability Guideline that recommends how best to provide DisplayPort, DVI and HDMI
connectivity for consumer PCs via the DisplayPort connector and simple cable adapters. The
Interoperability Guideline will describe how DisplayPort products may be designed to enable full
compatibility with HDMI products, providing a clear blueprint for display connectivity
convergence within the home.
DisplayPort adds capabilities to support High Bandwidth Digital Content Protection (HDCP) for
viewing protected content such as high definition movies on optical media. HDCP version 1.3 for
DisplayPort uses 128-bit AES encryption and is provided by the Digital Content Protection (DCP)
LLC. This version allows products supporting either DVI or HDMI, and DisplayPort, to share a
common encryption key set. This copy-protection is licensed separately.

PC Architecture (TXW102) September 2008 104


Topic 6 - Graphics Architecture

The DisplayPort connector supports one to four data pairs via the main link carrying both audio and
clock signals with a transfer rate of 1.62 or 2.7 Gb/s. Video signals use 8 or 10 bit pixel format per
color channel. A bi-directional auxiliary channel runs at a constant 1 MB/s that serves as Main Link
management and device control using VESA EDID and VESA MCCS standards. The video signal
is not compatible with DVI or HDMI. Full bandwidth is supported over a 3 meter cable with
reduced bandwidth to 1080p on a 15 meter cable. It is hot-pluggable.
A single DisplayPort connector can support multiple monitors if the monitors support daisy
chaining. DisplayPort supports up to six 1080i displays with daisy chaining with its packet-based
signal.

DVI VGA DisplayPort

DisplayPort to DVI Monitor Cable


Lenovo markets a Single-link DVI-D Cable. This accessory attaches a DVI monitor connector to a
system DisplayPort connector.

DisplayPort to Single-link DVI-D Cable


(part number 45J7915)

PC Architecture (TXW102) September 2008 105


Topic 6 - Graphics Architecture

Summary:
Graphics Architecture

• The graphics subsystem consists of the


graphics controller, graphics memory, and
a CRT or TFT flat-panel monitor.
• The Intel Graphics Media Accelerator provides
good performance at a lower cost than a
discrete PCI Express x16 controller.
• PCI Express x16 graphics adapters provide the PCI Express x16 Graphics Adapter
highest performance compared to integrated ATI Radeon HD 2400 XT 256MB
graphics.
• TFT flat-panel monitors have many advantages
over analog CRT monitors.
• A digital interface to a monitor provides a better
quality signal than an analog interface does.
• Emerging monitor connectors include DVI,
HDMI, and DisplayPort.

Lenovo ThinkVision L190x


© 2008 Lenovo

PC Architecture (TXW102) September 2008 106


Topic 6 - Graphics Architecture

Review Quiz

Objective 1

1. In order to reach a specific resolution and refresh rate other than VGA, all of the following
graphics processing elements must support the desired objective except which of the following?
a. Monitor
b. System bus
c. Device driver
d. Graphics controller

2. A monitor has 1280x1024 resolution. What does the number 1024 represent?
a. 1024 vertical units
b. 1024 horizontal units
c. 1024 diagonal units
d. 1024 triads

3. What determines the amount of simultaneous colors on a display?


a. The triads of the display
b. The types of phosphor in the display
c. The video BIOS
d. The amount of graphics or video memory

4. A vendor that advertises a 24-bit color graphics controller has a controller that can display how
many simultaneous colors?
a. 256
b. 65,536
c. 16.7 million
d. 33.4 million

5. What converts the digital signal in a PC to the analog signal understood by the monitor?
a. Video memory
b. Rambus DRAM
c. Video feature connector
d. Digital-to-analog converter (DAC)

6. What is the advantage of a discrete graphics solution over an integrated graphics controller?
a. Less heat
b. Better performance
c. Lower cost
d. Use of an adapter slot

PC Architecture (TXW102) September 2008 107


Topic 6 - Graphics Architecture

7. What term refers to special chips that allow PCs to display images from all sides with an illusion
of depth?
a. Digital-to-analog converter (DAC)
b. Bitblt
c. Direct3D
d. 3D graphics

8. What is a way to see more applications by having unique images on all monitors attached to a
system?
a. Cloning
b. Multi-monitoring
c. Direct 3D
d. 16.7 million color depth

Objective 2

9. What technology, which is a combination of video processing hardware and software


technologies for a wide range of digital displays, is integrated into the Intel integrated graphics
chipsets?
a. Intel Clear Video Technology
b. Intel Flex Memory Technology
c. Intel Quiet System Technology
d. Intel Fast Memory Access

Objective 3

10. What type of interface is used for high-end graphics support?


a. PCI
b. PCI Express x16
c. Turbo Memory
d. SurroundView

Objective 4

11. What is the main reason for flicker?


a. A high horizontal scan rate
b. Poor convergence
c. An interlaced image
d. A low vertical refresh rate

12. Which monitor tube technology is best for reducing glare?


a. Conventional
b. Flatter Squarer Tube (FST)
c. Aperture grill
d. Flat Shadow Mask

PC Architecture (TXW102) September 2008 108


Topic 6 - Graphics Architecture

Objective 5

13. A graphics controller with a Digital Video Interface (DVI-D) connector will work best with
which monitor?
a. Analog CRT
b. Digital LCD with an analog-to-digital converter
c. Digital LCD that accepts a digital signal
d. Analog CRT with a DVI connector

Objective 6

14. What connector provides a completely digital signal from a PC to supported monitors?
a. Voltage Regulator Module (VRM)
b. Digital Visual Interface (DVI)
c. Very Large Memory (VLM)
d. Flat Display (FD)

15. How would a graphics controller with a DVI-I connector interface with a traditional analog DB-
15 monitor connector?
a. A dongle is required to connect the DVI-I connector to the DB-15 connector on the
monitor.
b. The monitor DB-15 connector will attach directly to a DVI-I connector.
c. The graphics connector must be converted to a DVI-A connector.
d. The graphics controller will connect to the analog monitor but no signal is possible
because it is an analog monitor.

PC Architecture (TXW102) September 2008 109


Topic 6 - Graphics Architecture

Answer Key
1. B
2. A
3. D
4. C
5. D
6. B
7. D
8. B
9. A
10. B
11. D
12. D
13. C
14. B
15. A

PC Architecture (TXW102) September 2008 110

You might also like