Raphics Rocessing NIT: Nust College of Electrical and Mechanical Engineering

NUST COLLEGE
OF ELECTRICAL
AND MECHANICAL ENGINEERING
Computer Organization
Assignment 2
GRAPHICS PROCESSING
UNIT
Submitted by:
Warda Ahmed
NS 7800
D-CE-37
Syndicate B
Assignment - 2
Contents
Table of Figures........................................................................................ 2
List of Acronyms....................................................................................... 2
Abstract................................................................................................... 3
Processors............................................................................................... 4
Graphics Processing Unit (GPU).................................................................4
I. What is a GPU?......................................................................................4
II.Uses.....................................................................................................5
III....................................................................................................... GPU Manufacturers
5
IV......................................................................................................... Evolution of GPUS
5
1.
2.
3.
4.
Video Shifters (1970-1998).................................................................................................... 6

1980s.................................................................................................................................... 7
1990s..................................................................................................................................... 8
The First GPU (1999).............................................................................................................. 9
V. Features...............................................................................................9
1. Memory Features.................................................................................................................. 10
VI.......................................................................................................... GPU Architecture

11
1. Graphics pipeline................................................................................................................. 11
2. Evolution of the Graphics Pipeline........................................................................................ 12
3. CPU VERSUS GPU................................................................................................................. 13
VII. Types..............................................................................................14
1.
2.
3.
4.
5.
Dedicated graphics cards..................................................................................................... 14

Integrated graphics solutions............................................................................................... 15
Hybrid solutions................................................................................................................... 15
Stream Processing and General Purpose GPUs (GPGPU)......................................................15
External GPU (eGPU)............................................................................................................ 15
VIII. GPU Accelerated Computing.............................................................16

IX..................................................................................................................... Advantages
17
Conclusion............................................................................................. 19
References............................................................................................. 20
NS-7800 Warda Ahmed
Page | 1
D-CE-37 (B)
Assignment - 2
Table of Figures
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
1: GeForce 6600GT (NV43) GPU...............................................................................4

2: Transistor count trends for GPU cores..................................................................6
3: Atari 2600 released in September 1977...............................................................7
4: Contrast between VGA and 3Dfx Graphics...........................................................8
4.5: GPU graphics pipeline 1..................................................................................11
4. 6: Fixed-Function rendering pipelines (FFP)........................................................13
4.7: Programmable Shader 1..................................................................................14
4.8 Non-unified vs. Unified Shader Architecture.....................................................15
5(a): GPU core vs. CPU core...................................................................................16
5(b): CPU core vs. GPU core...................................................................................17
6: GPU acceleration................................................................................................21
7: Multi-GPU performance......................................................................................22
8: Programmable shading: Soap bubble effect.......................................................22
9: Demonstration of realistic graphics with NVIDIA Geforce 8800 GTX...................23
List of Acronyms
1. CPU Central Processing Unit
2. GPU Graphics Processing Unit
3. VPU Visual Processing Unit
4. RAM Random Access Memory
5. GIS Geographic Information System
6. CAD Computer Aided Design
7. AGP Accelerated Graphics Port
8. PCI Peripheral Component Interconnect
9. TIGA Texas Instruments Graphics Architecture
10.
PGA Professional Graphics Controller
11.
SGI Silicon Graphics Inc.
12.
API Application Programming Interface
13.
Monochrome and Color Display Adapter (MDA/CDA)
14.
Inverse discrete cosine transform (iDCT)
15.
iMDCT Inverse modified discrete cosine transform
16.
IQ Inverse quantization
17.
VLD Variable-length decoding
18.
IGP Integrated graphics processors
NS-7800 Warda Ahmed
Page | 2
D-CE-37 (B)
Assignment - 2
Abstract
Processors are the central component of computers that carry out all the
computers work. One essential type of processor in the majority of modern computers
and electronic devices today is the graphics processing unit. It is mainly used for graphics
processing and 3D design. Leading GPU manufacturers are NVIDIA and AMD.
Modern GPU processors are parallel, and are fully programmable. NVIDIA developed
the first GPU. GPU architecture shifted from fixed graphics pipeline to programmable to a
unified shader model. A GPU has a lot more cores than a CPU. The different types of GPU
are dedicated, integrated, general purpose and integrated. It accelerates computing and
is a necessity for optimum computer performance.
NS-7800 Warda Ahmed
Page | 3
D-CE-37 (B)
Assignment - 2
Processors
The word processor is short for microprocessor or central processing unit (CPU). The
processor in a personal computer or embedded in small devices is often called a
microprocessor.
A processor is a small chip made of silicon and is a central
component of computers and other electronic devices. It is the logic
circuitry that responds to and processes the basic instructions that
drive a computer. The main job of a processor is to receive input
through input devices, analyze and process the input commands,
and produce an appropriate output. Modern processors can compute
trillions of calculations per second.
Graphics Processing Unit (GPU)

I. What is a GPU?
Aside from the CPU, many laptops and desktop computer also include a GPU, which
stands for Graphics Processing Unit. It is a single-chip processor similar to the CPU. Also
called visual processing unit (VPU), is a specialized electronic circuit designed to rapidly
manipulate and alter memory to accelerate the creation of images in a frame buffer
intended for output to a display. A GPU is used for 3-D applications and functions like 3D
motion. It creates lighting effects and transforms objects every time a 3D scene is
redrawn. It is specifically designed for rendering graphics that are displayed as output on
monitor screens.
Modern GPUs are efficient at manipulating image processing and
computer graphics. Their highly parallel structure is more effective
than general-purpose CPUs for algorithms where the processing of big
blocks of visual data is done in parallel. By using a CPU for system
processing and a separate GPU processor for graphics processing, the
CPU is not overloaded and the computer can run graphic intensive
applications more efficiently. The mathematically-intensive tasks
would overburden the CPU. Lifting this load from the CPU frees
up cycles that can then be used for other jobs.
Figure 1: GeForce
6600GT (NV43)
GPU
Some terms that need to be explained to understand GPUS;
Rendering: the process of generating an image from a model

Vertex: the corner of a polygon (usually that polygon is a triangle)
Pixel: smallest addressable screen element
NS-7800 Warda Ahmed
Page | 4
D-CE-37 (B)
Assignment - 2
II. Uses
GPUs are found in a wide range of systems, including embedded systems, cell
phones, personal computers, workstations and game consoles, and supercomputers. It is
placed in a video card in desktop computers and integrated into the motherboard of
mobile devices.
Most GPUs use their transistors for 3-D computer graphics. However, some have
accelerated memory for mapping vertices, such as geographic information system (GIS)
applications. Some of the more modern GPU technology supports programmable shaders
implementing textures, mathematical vertices and accurate color formats. Applications
such as computer-aided design (CAD) can process over 200 billion operations per second
and deliver up to 17 million polygons per second. Many scientists and engineers use GPUs
for more in-depth calculated studies utilizing vector and matrix features. [1]
III. GPU Manufacturers

Many companies have produced GPUs under a number of brand names. The most
commonly used GPUs are developed by the companies AMD, NVIDIA, Intel, Via, Matrox
Graphics, PowerVR, and SiS. In 2009, Intel, Nvidia and AMD/ATI were the market share
leaders, with 49.4%, 27.8% and 20.6% market share respectively. However, those
numbers include Intel's integrated graphics solutions as GPUs. Not counting those
numbers, Nvidia and ATI control nearly 100% of the market as of 2008. [2]
IV. Evolution of GPUS

The evolution of GPU hardware architecture has gone from a specific single core, fixed
function hardware pipeline implementation made solely for graphics, to a set of highly
parallel and programmable cores for more general purpose computation. The trend in GPU
technology has added more programmability and parallelism to a GPU core architecture
that is ever evolving towards a general purpose more CPU-like core. Modern GPU
processors are massively parallel, and are fully programmable. The parallel floating point
computing power found in a modern GPU is a lot greater than a CPU. Future GPU
generations will look more and more like wide-vector general purpose CPUs, and
eventually both will be seamlessly combined as one. [3]
NS-7800 Warda Ahmed
Page | 5
D-CE-37 (B)
Assignment - 2
The transistor count trends for some GPUs is shown in the following figure:
Figure 2: Transistor count trends for GPU cores
The history of GPUs is as follows:
1.
Video Shifters (1970-1998)
3D graphics started with early display controllers, known as video shifters and video
address generators. They acted as a pass-through between the main processor and the
display. The incoming data stream was converted into serial bitmapped video output such
as luminance, color, as well as vertical and horizontal composite sync, which kept the line
of pixels in a display generation and synchronized each successive line along with the
blanking interval (the time between ending one scan line and starting the next). Arcade
system boards have been using specialized graphics chips since the 1970s. In early video
game hardware the RAM for frame buffers was too expensive, so video chips composited
data together as the display was being scanned out on the monitor.
A flurry of designs arrived in the latter half of the 1970s, laying the foundation for 3D
graphics as we know them. [4]
Fujitsu's MB14241 video shifter was used to accelerate the drawing of sprite graphics for
various 1970s arcade games from Taito and Midway, such as Gun Fight (1975), Sea
Wolf (1976) and Space Invaders (1978). [5] The Namco Galaxian arcade system in 1979
used specialized graphics hardware supporting RGB color, multi-colored sprites
and tilemap backgrounds. The Galaxian hardware was widely used during the golden age
of
arcade
video
games,
by
game
companies
such
as Namco, Centuri, Gremlin, Irem, Konami, Midway, Nichibutsu, Sega and Taito.
NS-7800 Warda Ahmed
Page | 6
D-CE-37 (B)
Assignment - 2
RCAs Pixie video chip (CDP1861) in 1976 could output a NTSC compatible video signal
at 62x128 resolution, or 64x32 for the RCA Studio II console.
In the home market, the Atari 2600 in 1977 used a

video shifter called the Television Interface Adaptor. [6]
It can be seen in Figure 2.
In 1978, Motorola unveiled the MC6845 video address
generator. This became the basis for the IBM PCs
Monochrome and Color Display Adapter (MDA/CDA)
cards of 1981, and provided the same functionality for
the Apple II. Motorola added the MC6847 video display Figure 3: Atari 2600 released in
generator later the same year, which made its way into September 1977
a number of first generation personal computers, including the Tandy TRS-80.
The Atari 8-bit computers (1979) had ANTIC, a video processor which interpreted
instructions describing a "display list"the way the scan lines map to specific bitmapped
or character modes and where the memory is stored (so there did not need to be a
contiguous frame buffer). [7] 6502 machine code subroutines could be triggered on scan
lines by setting a bit on a display list instruction. [8] ANTIC also supported smooth vertical
and horizontal scrolling independent of the CPU. [9]
2.
1980s
In the early 1980's, "GPUs" were integrated frame buffers. They were boards of TTL logic
chips that relied on the CPU, and could only draw wire-frame shapes to raster displays
[10].
The Williams Electronics arcade games Robotron: 2084 , Joust, Sinistar, and Bubbles, all
released in 1982, contain custom blitter chips for operating on 16-color bitmaps. [11]
In 1985, the Commodore Amiga featured a custom graphics chip, supporting line draw,
area fill and a blitter unit which accelerated manipulation of bitmaps.
In 1986, Texas Instruments released the TMS34010, the first microprocessor with on-chip
graphics capabilities. It could run general-purpose code, but it had a very graphicsoriented instruction set. In 1990-1991, this chip would become the basis of the Texas
Instruments Graphics Architecture ("TIGA") Windows accelerator cards.
One of the very first 2D/3D video cards for the PC was the IBM Professional Graphics
Controller (PGA). The PGA used an on-board Intel 8088 microprocessor to take over
processing all video related tasks, freeing up the CPU for video processing (such as
drawing and coloring filled polygons). Though it was released in 1984, 10 years before
hardware 2D/3D acceleration was standardized, its high cost and incompatibility with
many programs and non-IBM systems made it unable to achieve mass-market success.
The PGA's separate on-board processor marked an important step in GPU evolution to
further the paradigm of using a separate processor for graphics computations [12].
By 1987, more features were being added to early GPUs, such as Shaded Solids, Vertex
lighting, Rasterization of filled polygons, and Pixel depth buffer, and color blending. There
was still much reliance on sharing computation with the CPU [10]. In the late 1980's,
NS-7800 Warda Ahmed
Page | 7
D-CE-37 (B)
Assignment - 2
Silicon Graphics Inc. (SGI) emerged as a high performance computer graphics hardware
and software company. With the introduction of OpenGL in 1989, SGI created and released
the graphics industry's most widely used and supported, platform independent, 2D/3D
application programming interface (API). OpenGL support has also become an intricate
part of the design of modern graphics hardware. SGI also pioneered the concept of the
graphics pipeline early on [12].
3.
1990s
Launched on November 1996, 3Dfx's Voodoo graphics consisted of a 3D-only card that
required a VGA cable pass-through from a separate 2D card to the Voodoo, which then
connected to the display.
In March of 1996 15 titles with Voodoo support debuted in E3 with wholly new levels of
visual quality. The difference in video quality can be seen in a demo of the popular game
Quake, as shown in the following figure. The left side shows ordinary low resolution
graphics, while the right side shows the improved resolution of the 3dfx on OpenGL.
Figure 4: Contrast between VGA and 3Dfx Graphics
3dfx planned to build high end 3D gaming board capable to deliver smooth gameplay at
640x480 resolution with bilineary filtered textures.
Voodoo was used in arcade machines and through Quantum's multichip boards, had
professional promise as well. It focused on raw power in fundamental 3d operations. 3dfx
cut the right corners of pipeline, reducing gate count without much impact on image
quality. Voodoo was easy to program and hard to slow down.
NS-7800 Warda Ahmed
Page | 8
D-CE-37 (B)
Assignment - 2
3dfxs technology became the forerunner of many image quality enhancements seen
today, like soft shadows and reflections, motion blur, as well as depth of field blurring.
4.
The First GPU (1999)
The first company to develop the worlds first commercial GPU is NVIDIA Inc. in 1999 . The
GeForce 256 GPU was capable of billions of calculations per second, can process a
minimum of 10 million polygons per second, and has over 22 million transistors,
compared to the 9 million found on the Pentium III. Its workstation version called the
Quadro, designed for CAD applications, can process over 200 billion operations a second
and deliver up to 17 million triangles per second. [13] It was a single-chip processor with
integrated transform, drawing and BitBLT support, lighting effects, triangle setup/clipping
and rendering engines. NVIDIAs rival company ATI Technologies came up with the name
VPU or visual processing unit when they released the Radeon 9700 in 2002.
Fairly early on in the GPU market, there was a severe narrowing of competition. Early
leading companies were Silicon Graphics International, 3dfx, NVIDIA, ATI and Matrox,
when GPUs were a new concept. Now only AMD and NVIDIA are GPU manufacturing
giants.
Since their inception, GPUs have gradually become more powerful, programmable, and
general purpose with programmable geometry, vertex and pixel processors, Unified
Shader Model, Expanding instruction set and CUDA, OpenCL. [14] OpenCL is an open
standard defined by the Khronos Group which allows for the development of code for both
GPUs and CPUs with an emphasis on portability. OpenCL solutions are supported by Intel,
AMD, Nvidia, and ARM, and according to a recent report by Evan's Data, OpenCL is the
GPGPU development platform most widely used by developers in both the US and Asia
Pacific.
Nvidia Kepler:
A graphical processing unit that holds the distinction of being the first GPU designed for
the cloud. Graphics cards powered by Nvidia Kepler processors are tuned to efficiently
serve virtualized desktops, providing auto-scaling to the necessary performance level.
V. Features
GPU features include
2-D or 3-D graphics
Digital output to flat panel display monitors
Texture mapping
NS-7800 Warda Ahmed
Page | 9
D-CE-37 (B)
Assignment - 2
Application support for high-intensity graphics software such as AutoCAD
Rendering polygons
Support for YUV color space
Hardware overlays
MPEG decoding
GPU accelerated video decoding:
More recent graphics cards decode high-definition video on the card, offloading the
central processing unit. The video decoding processes that can be accelerated by today's
modern GPU hardware are:
Motion compensation (mocomp)
Inverse discrete cosine transform (iDCT)
Inverse telecine 3:2 and 2:2 pull-down correction
Inverse modified discrete cosine transform (iMDCT)
In-loop deblocking filter
Intra-frame prediction
Inverse quantization (IQ)
Variable-length decoding (VLD), more commonly known as slice-level acceleration
Spatial-temporal deinterlacing and automatic interlace/progressive source detection
Bitstream processing (Context-adaptive variable-length coding/Context-adaptive

binary arithmetic coding) and perfect pixel positioning.
1.
Memory Features
The only two types of memory that actually reside on the GPU chip are register and
shared memory. Local, Global, Constant, and Texture memory all reside off chip.
Local, Constant, and Texture are all cached.
While it would seem that the fastest memory is the best, the other two
characteristics of the memory that dictate how that type of memory should be
utilized are the scope and lifetime of the memory:
Data stored in register memory is visible only to the thread that wrote it and lasts
only for the lifetime of that thread.
Local memory has the same scope rules as register memory, but performs slower.
Data stored in shared memory is visible to all threads within that block and lasts
for the duration of the block. This is invaluable because this type of memory allows
for threads to communicate and share data between one another.
NS-7800 Warda Ahmed
P a g e | 10
D-CE-37 (B)
Assignment - 2
Data stored in global memory is visible to all threads within the application
(including the host), and lasts for the duration of the host allocation.
Constant and texture memory wont be used here because they are beneficial for
only very specific types of applications. Constant memory is used for data that
will not change over the course of a kernel execution and is read only. Using
constant rather than global memory can reduce the required memory bandwidth,
however, this performance gain can only be realized when a warp of threads read
the same location.Similar to constant memory, texture memory is another variety
of read-only memory on the device. When all reads in a warp are physically
adjacent, using texture memory can reduce memory traffic and increase
performance compared to global memory
GPU clock or Engine clock is the graphics processor unit's clock speed, measured in
megahertz (MHz).
VI. GPU Architecture

A GPU is a heterogeneous chip multi-processor (highly tuned for graphics).
New applications demand parallel processing and new computing devices are power
constrained. GPUs are therefore designed for high parallelism and lower power
consumption.
1.
Graphics pipeline
In 3D computer graphics, the graphics pipeline or rendering pipeline refers to the

sequence of steps used to create a 2D raster representation of a 3D scene.
NS-7800 Warda Ahmed
P a g e | 11
D-CE-37 (B)
Assignment - 2
Figure 4.5: GPU graphics pipeline 1
The various stages in the typical pipeline of a modern GPU (also seen in figure 4.5) :
Bus interface/Front End
Interface to the system to send and receive data and commands.
Vertex Processing
Converts each vertex into a 2D screen position, and lighting may be applied to determine
its color. A programmable vertex shader enables the application to perform custom
transformations for effects such as warping or deformations of a shape.
Clipping
This removes the parts of the image that are not visible in the 2D screen view such as the
backsides of objects or areas that the application or window system covers.
Primitive Assembly, Triangle Setup
Vertices are collected and converted into triangles. Information is generated that will
NS-7800 Warda Ahmed
P a g e | 12
D-CE-37 (B)
Assignment - 2
allow later stages to accurately generate the attributes of every pixel associated with the
triangle.
Rasterization
The triangles are filled with pixels known as "fragments," which may or may not wind up
in the frame buffer if there is no change to that pixel or if it winds up being hidden.
Occlusion Culling
Removes pixels that are hidden (occluded) by other objects in the scene.
Parameter Interpolation
The values for each pixel that were rasterized are computed, based on color, fog, texture,
etc.
Pixel Shader
This stage adds textures and final colors to the fragments. Also called a "fragment
shader," a programmable pixel shader enables the application to combine a pixel's
attributes, such as color, depth and position on screen, with textures in a user-defined
way to generate custom shading effects.
Pixel Engines
Mathematically combine the final fragment color, its coverage and degree of transparency
with the existing data stored at the associated 2D location in the frame buffer to produce
the final color for the pixel to be stored at that location. Output is a depth (Z) value for the
pixel.
Frame Buffer Controller
The frame buffer controller interfaces to the physical memory used to hold the actual
pixel values displayed on screen. The frame buffer memory is also often used to store
graphics commands, textures as well as other attributes associated with each pixel.
2.
Evolution of the GPU Architecture
Until recently, the process of generating computer graphics was referred to as the
graphics pipeline. But that just wasnt cutting it for sophisticated effects like water and
smoke. Overtime, the process has been taken over by more flexible shaders, and now
uses universal shaders able to perform tasks. The graphic rendering mechanisms shifted
from fixed graphic pipelines to programmable graphic pipelines, to unified shader models.
1.FIXED GRAPHICS PIPELINE (Fixed-Function rendering

pipelines (FFPS))
NS-7800 Warda Ahmed
P a g e | 13
D-CE-37 (B)
Assignment - 2
Fixed-function meant that the developer could not configure the functions the FFPs
performed. Parameters like the colours of objects, etc could be changed, but the functions
themselves remained. The game logic, textures and triangles were sent to the GPU, which
would take care of all the heavy processing. The processing is visualized step by step in
the following figure. Each step is explained in the previous topic (Graphics pipeline).
Figure 4. 6: Fixed-Function rendering pipelines (FFP)
PROS:
The hardware was wired and narrowly specialized to perform standard operations
on data. This made it much faster than the processor performing the same tasks.
It had new features like multiple blending modes, per-vertex Gouraud shading, fog
effects, stencil buffers (for shadow volumes), etc.
CONS:
NS-7800 Warda Ahmed
P a g e | 14
D-CE-37 (B)
Assignment - 2
FFP was limited by the amount of functions it could perform. There was no variation
or flexibility and thus no realistic graphics could be visualized.
It was impossible to go back in the stages of the pipeline to make changes as
required. For example, transparent objects like water or smoke tended to look solid,
or flicker in and out. To counter this opacity, the opaque surface was animated to
flicker.
If the graphics pipelines hardware wasnt matched perfectly to the processing
needs of the task, some of it sat idle. And since the images that need to be
displayed are very different, the match was never perfect.
2.
PROGRAMMABLE SHADERS (Separated Shader
Architecture)
In order to provide more sophisticated graphics to users, manufacturers started making
the fixed function hardware at each stage of the pipeline more flexible. Some of them
became known as shaders, and they eventually became flexible enough to overcome
most of the difficulties caused by a linear pipeline. The flow of process of programmable
shaders can be seen in the following diagram;
Figure 4.7: Programmable Shader 1
PROBLEMS:
While one part of the FFP was fixed, the other problem remained (i.e. part of the
pipeline doing nothing and sitting idle). The shaders were of three types; Vertex
shaders would construct the 3D model and light the vectors making it up.
NS-7800 Warda Ahmed
P a g e | 15
D-CE-37 (B)
Assignment - 2
Geometry shaders would make the lines into surfaces. Pixel shaders would
apply the textures and other effects. But one shader could only do one type of task
while the other two shaders were idle.
3.
UNIFIED SHADERS
The specialized logic like vertex shaders, pixel shaders and hardwired algorithms were
replaced with many copies of one unified CPU design. Shaders are now made so that they
are no longer confined to a certain task. There are no more vertex, geometry, and pixel
shaders: just shaders. A unified shader can do any of the three kinds of work, so it can do
whatever needs doing instead of waiting for work it can do to come in.
Figure 4.8 shows a nonunified architecture versus a unified shader architecture. The
advantage of unified approach is that one can have several shader cores and use them
for any type of shader (IIV in this example). This gives better load balancing. IB and OB
are input and output buffers.
Figure 4.8 Non-unified vs. Unified Shader Architecture
Unified Shader Architecture allows more flexible use of the graphics rendering hardware
[15]. For example, in a situation with a heavy geometry workload the system could
NS-7800 Warda Ahmed
P a g e | 16
D-CE-37 (B)
Assignment - 2
allocate most computing units to run vertex and geometry shaders. In cases with less
vertex workload and heavy pixel load, more computing units could be allocated to run
pixel shaders.
Most graphics hardware currently uses DirectX to communicate with the applications
being run. It is an Application Programming Interface, or API, that programmers use to get
their software to use hardware effectively. Microsoft tweaked it over time, and Direct X 10
implemented a unified shader instruction set.
That means that software for different kinds of shaders could be written in a more similar
manner, making the programmers job easier. In an uncommon piece of hardware and
software changing at the same time to benefit from the changes in the other, ATI and
Nvidia both started making GPUs with unified shaders. [16]
The unified shading architecture was introduced with the Nvidia GeForce 8
series, ATI Radeon HD 2000, S3 Chrome 400, Intel GMA X3000 series, Xbox 360's
GPU, Qualcomm Adreno 200 series,PowerVR SGX GPUs and is used in all subsequent
series. OpenGL 3.3 (which offers a unified shader model) can still be implemented on
hardware that does not have unified shading architecture.
3.
CPU VERSUS GPU
A simple way to understand the difference between a CPU and GPU is to compare how
they process tasks. A CPU consists of a few cores optimized for sequential serial
processing while a GPU has a massively parallel architecture consisting of thousands of
smaller, more efficient cores designed for handling multiple tasks simultaneously. Figure 5
shows the difference between their cores.
Figure 5(a): GPU core vs. CPU core
NS-7800 Warda Ahmed
P a g e | 17
D-CE-37 (B)
Assignment - 2
Figure 5(b): CPU core vs. GPU core
The amount of cores that GPUs have depends on the manufacturer. nVidia graphics
solutions tend to pack more power into fewer chips, while AMD solutions pack in more
cores to increase processing power. Typical high-end graphics cards have 68 cores if its
nVidia, and ~1500 cores if its AMD.
NS-7800 Warda Ahmed
P a g e | 18
D-CE-37 (B)
VII.
Assignment - 2
Types
GPUs come in different shapes and forms, such as dedicated cards which you can plug
into your desktops PCI-Express slot, to graphical chips called integrated graphics chips,
which are built directly into the motherboard the backbone component of your system.
1.Dedicated graphics cards

The GPUs of the most powerful class typically interface with the motherboard by means of
an expansion slot such as PCI Express (PCIe) or Accelerated Graphics Port (AGP) and can
usually be replaced or upgraded with relative ease, assuming the motherboard is capable
of supporting the upgrade. A few graphics cards still use Peripheral Component
Interconnect (PCI) slots, but their bandwidth is so limited that they are generally used only
when a PCIe or AGP slot is not available.
A dedicated GPU is not necessarily removable, nor does it necessarily interface with the
motherboard in a standard fashion. The term "dedicated" refers to the fact that dedicated
graphics cards have RAM that is dedicated to the card's use, not to the fact
that most dedicated GPUs are removable. Dedicated GPUs for portable computers are
most commonly interfaced through a non-standard and often proprietary slot due to size
and weight constraints. Such ports may still be considered PCIe or AGP in terms of their
logical host interface, even if they are not physically interchangeable with their
counterparts.
Technologies such as SLI by Nvidia and CrossFire by AMD allow multiple GPUs to draw
images simultaneously for a single screen, increasing the processing power available for
graphics.
2.
Integrated graphics solutions
Integrated graphics solutions, shared graphics solutions, or integrated graphics

processors (IGP) utilize a portion of a computer's system RAM rather than dedicated
graphics memory. IGPs can be integrated onto the motherboard as part of the chipset, or
within the same die as CPU (like AMD APU or Intel HD Graphics). On certain
motherboard. AMD's IGPs can use dedicated sideport memory. This is a separate fixed
block of high performance memory that is dedicated for use by the GPU. In early 2007,
computers with integrated graphics account for about 90% of all PC shipments. These
solutions are less costly to implement than dedicated graphics solutions, but tend to be
less capable. Historically, integrated solutions were often considered unfit to play 3D
games or run graphically intensive programs but could run less intensive programs such
as Adobe Flash. Examples of such IGPs would be offerings from SiS and VIA circa 2004
[17]. However, modern integrated graphics processors such as AMD Accelerated
Processing Unit and Intel HD Graphics are more than capable of handling 2D graphics or
low stress 3D graphics.
As a GPU is extremely memory intensive, an integrated solution may find itself competing
for the already relatively slow system RAM with the CPU, as it has minimal or no dedicated
video memory. IGPs can have up to 29.856 GB/s of memory bandwidth from system RAM,
however graphics cards can enjoy up to 264 GB/s of bandwidth between its RAM and GPU
NS-7800 Warda Ahmed
P a g e | 19
D-CE-37 (B)
Assignment - 2
core. This bandwidth is what is referred to as the memory bus and can be performance
limiting. Older integrated graphics chipsets lacked hardware transform and lighting, but
newer ones include it [18].
3.
Hybrid solutions
This newer class of GPUs competes with integrated graphics in the low-end desktop and
notebook markets. The most common implementations of this are ATI's HyperMemory and
Nvidia's TurboCache.
Hybrid graphics cards are somewhat more expensive than integrated graphics, but much
less expensive than dedicated graphics cards. These share memory with the system and
have a small dedicated memory cache, to make up for the high latency of the system
RAM. Technologies within PCI Express can make this possible. While these solutions are
sometimes advertised as having as much as 768MB of RAM, this refers to how much can
be shared with the system memory.
4.
Stream Processing and General Purpose
GPUs (GPGPU)
It is becoming increasingly common to use a general purpose graphics processing unit
(GPGPU) as a modified form of stream processor. This concept turns the massive
computational power of a modern graphics accelerator's shader pipeline into generalpurpose computing power, as opposed to being hard wired solely to do graphical
operations. In certain applications requiring massive vector operations, this can yield
several orders of magnitude higher performance than a conventional CPU. The two largest
discrete GPU designers, ATI and Nvidia, are beginning to pursue this approach with an
array of applications.
GPGPU can be used for many types of parallel tasks including ray tracing. They are
generally suited to high-throughput type computations that exhibit data-parallelism to
exploit the wide vector width SIMD architecture of the GPU.
5.
External GPU (eGPU)
An external GPU is a graphics processor located outside of the housing of the computer.
External graphics processors are often used with laptop computers. Laptops might have a
substantial amount of RAM and a sufficiently powerful central processing unit (CPU), but
often lack a powerful graphics processor (and instead have a less powerful but more
energy-efficient on-board graphics chip). On-board graphics chips are often not powerful
enough for playing the latest games, or for other tasks.
NS-7800 Warda Ahmed
P a g e | 20
D-CE-37 (B)
VIII.
Assignment - 2
GPU Accelerated Computing
GPU-accelerated computing is the use of a graphics processing unit (GPU) together with a
CPU to accelerate scientific, analytics, engineering, consumer, and enterprise
applications. Pioneered in 2007 by NVIDIA, GPU accelerators now power energy-efficient
datacenters in government labs, universities, enterprises, and small-and-medium
businesses around the world. GPUs are accelerating applications in platforms ranging
from cars, to mobile phones and tablets, to drones and robots.
GPU-accelerated computing offers unprecedented application performance by offloading compute-intensive
portions of the application to the GPU, while the remainder of the code still runs on the CPU. From a user's
perspective, applications simply run significantly faster. This basic process can be seen in figure 6:
Figure 6: GPU acceleration
NS-7800 Warda Ahmed
P a g e | 21
D-CE-37 (B)
IX.
Assignment - 2
Advantages
A multi-GPU system provides more than just performance gains. It also gives you
the freedom to run your applications with full features and effects enabled. Figure 7
shows the increased performance of applications with the increased GPU usage.
Figure 7: Multi-GPU performance
2-D to 3-D graphics revolution. The introduction of programmable shading in 2001

led to several visual effects not previously possible, such as this simulation of
refractive chromatic dispersion for a soap bubble effect in figure 8:
NS-7800 Warda Ahmed
P a g e | 22
D-CE-37 (B)
Assignment - 2
Figure 8: Programmable shading: Soap bubble effect
Modern GPUs can use programmable shading to achieve near-cinematic realism, as figure
9 shows, featuring actress Adrianne Curry on an NVIDIA GeForce 8800 GTX.
Figure 9: Demonstration of realistic graphics with NVIDIA Geforce 8800 GTX
Furthermore, GPU-based high performance computers are starting to play a

significant role in large-scale modelling. Three of the 10 most powerful
supercomputers in the world take advantage of GPU acceleration.
GPU decreases load of the CPU. It also consumes less power. It allows offloading of
large word intensive computations generally relevant to computer graphics
processing to another processor. This frees the main CPU to focus on other nonoffloadable transactions.
Realistic life-like graphics, improving with each new GPU architecture.
Conclusion
GPUs became more popular as the demand for graphic applications increased. Eventually,
they became not just an enhancement but a necessity for optimum performance of a PC.
Specialized logic chips now allow fast graphic and video implementations. Generally the
GPU is connected to the CPU and is completely separate from the motherboard. The
NS-7800 Warda Ahmed
P a g e | 23
D-CE-37 (B)
Assignment - 2
random access memory (RAM) is connected through the accelerated graphics port (AGP)
or the peripheral component interconnect express (PCI-Express) bus. Some GPUs are
integrated into the northbridge on the motherboard and use the main memory as a digital
storage area, but these GPUs are slower and have poorer performance.
They are a central component in devices in this age, without which it would be impossible
to perform graphically intensive tasks like video-encoding, decoding, graphic editing,
gaming, etc.
References
[1] techopedia.
[2] ""GPU sales strong as AMD gains market share"," techreport.com..
[3] C. McClanahan, "History and Evolution of GPU Architecture," Georgia Tech.
[4] G. Singer, "The History of the modern graphics processor".
[5] "Arcade/SpaceInvaders," Computer Archeology.
[6] A. Springmann, " "Atari 2600 Teardown: What's Inside Your Old Console?"," The
Washington Post.
[7] ""What are the 6502, ANTIC, CTIA/GTIA, POKEY, and FREDDIE chips?"," Atari8.com.
[8] K. E. Wiegers, " "Atari Display List Interrupts"," COMPUTE! (47): 161, (April 1984).
[9] K. E. Wiegers, " "Atari Fine Scrolling".," COMPUTE! (67): 110., (December 1985)..
[10 I. Buck, " The Evolution of GPUs for General Purpose Computing. GTC 2010.".
]
[11 S. Riddle, " "Blitter Information".".
]
[12 T. Crow, " Evolution of the Graphical Processing Unit," 2004.
]
NS-7800 Warda Ahmed
P a g e | 24
D-CE-37 (B)
Assignment - 2
[13 V. Beal, "GPU - Graphics Processing Unit," webopedia.

]
[14 "Computer Systems Architecture," Lecture 23, Graphics Processing Unit.
]
[15 ""GeForce 8800 GTX: 3D Architecture Overview".," ExtremeTech..
]
[16 J. F. Amprimoz, "Graphics Processor Evolution: Pipeline to Unified Shader
]
Architecture," CPU, Graphics and Memory, 2009.
[17 T. Tscheblockov, ""Xbit Labs: Roundup of 7 Contemporary Integrated Graphics
]
Chipsets for Socket 478 and Socket A Platforms"".
[18 B. Sanford., " "Integrated Graphics Solutions for Graphics-Intensive Applications"".
]
NS-7800 Warda Ahmed
P a g e | 25
D-CE-37 (B)
NS-7800 Warda Ahmed
Assignment - 2
P a g e | 26
D-CE-37 (B)

Raphics Rocessing NIT: Nust College of Electrical and Mechanical Engineering

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Raphics Rocessing NIT: Nust College of Electrical and Mechanical Engineering

Uploaded by

Copyright:

Available Formats

NUST COLLEGE

Video Shifters (1970-1998).................................................................................................... 6

VI.......................................................................................................... GPU Architecture

Dedicated graphics cards..................................................................................................... 14

VIII. GPU Accelerated Computing.............................................................16

NS-7800 Warda Ahmed

1: GeForce 6600GT (NV43) GPU...............................................................................4

NS-7800 Warda Ahmed

NS-7800 Warda Ahmed

Graphics Processing Unit (GPU)

Some terms that need to be explained to understand GPUS;

Rendering: the process of generating an image from a model

NS-7800 Warda Ahmed

III. GPU Manufacturers

IV. Evolution of GPUS

NS-7800 Warda Ahmed

Figure 2: Transistor count trends for GPU cores

The history of GPUs is as follows:

Video Shifters (1970-1998)

NS-7800 Warda Ahmed

In the home market, the Atari 2600 in 1977 used a

NS-7800 Warda Ahmed

Figure 4: Contrast between VGA and 3Dfx Graphics

NS-7800 Warda Ahmed

The First GPU (1999)

2-D or 3-D graphics

Digital output to flat panel display monitors

NS-7800 Warda Ahmed

Application support for high-intensity graphics software such as AutoCAD

Support for YUV color space

GPU accelerated video decoding:

Motion compensation (mocomp)

Inverse discrete cosine transform (iDCT)

Inverse telecine 3:2 and 2:2 pull-down correction

Inverse modified discrete cosine transform (iMDCT)

In-loop deblocking filter

Inverse quantization (IQ)

Variable-length decoding (VLD), more commonly known as slice-level acceleration

Spatial-temporal deinterlacing and automatic interlace/progressive source detection

Bitstream processing (Context-adaptive variable-length coding/Context-adaptive

NS-7800 Warda Ahmed

VI. GPU Architecture

In 3D computer graphics, the graphics pipeline or rendering pipeline refers to the

NS-7800 Warda Ahmed

Figure 4.5: GPU graphics pipeline 1

NS-7800 Warda Ahmed

Evolution of the GPU Architecture

1.FIXED GRAPHICS PIPELINE (Fixed-Function rendering

NS-7800 Warda Ahmed

Figure 4. 6: Fixed-Function rendering pipelines (FFP)

NS-7800 Warda Ahmed

Figure 4.7: Programmable Shader 1

NS-7800 Warda Ahmed

Figure 4.8 Non-unified vs. Unified Shader Architecture

NS-7800 Warda Ahmed

CPU VERSUS GPU

Figure 5(a): GPU core vs. CPU core

NS-7800 Warda Ahmed

Figure 5(b): CPU core vs. GPU core

NS-7800 Warda Ahmed

1.Dedicated graphics cards

Integrated graphics solutions