You are on page 1of 33

A Brief Overview of the Graphics Pipeline

Cedric Lee

What is a graphics pipeline?

3D Scene

Stage

Stage

Stage

Raster Image

Hardware, real-time / interactive rendering Popular APIs : OpenGL and DirectX

Overview

Basic Graphics Pipeline Modern Graphics Pipeline Beyond Pipelining The New Wave

Basic Graphics Pipeline

Use case:

Render a textured mesh with per-pixel lighting ambient light, 1 dir, 1 point, no shadows Assume z-buffer based architecture

3D Scene

Surface

Triangle mesh

Vertices and indices Per-vertex position, normal

Position + orientation (world matrix) Per-vertex uv, tangent, binormal Diffuse + normal maps

Material

Diffuse lighting (direction, colour) Camera (view + projection matrices)

Vertex Fetching

Vertex Stream

Per-Vertex

Input Assembler
Index Stream

Position-OS Normal-OS Tangent-OS Binormal-OS Texture UV

Vertex Processing
Per-Vertex Position-OS Normal-OS Tangent-OS Binormal-OS Texture UV

Per-Vertex

Vertex Shader
Uniform Constants World Matrix View Matrix Projection Matrix

Position-WS Position-SS Normal-WS Tangent-WS Binormal-WS Texture UV

Scan Conversion

Per-Vertex Position-WS Position-SS Normal-WS Tangent-OS Binormal-OS Texture UV

Per-Pixel

Trivial Reject

Rasterizer

Position-WS Position-SS Normal-WS Tangent-OS Binormal-OS Texture UV

Viewport clipping Back-face culling

Early Z rejection Interpolate

Pixel Processing
Per-Pixel Position-WS Position-SS Normal-WS Tangent-WS Binormal-WS Texture UV Textures Diffuse Normal Per-Pixel

Uniform Constants Ambient L colour Dir L colour Dir L dir Point L colour Point L pos

Pixel Shader
Texturing Lighting

Depth Colour Alpha

Raster Operators (ROPs)

Per-Pixel Depth Colour Alpha

Depth Test, Alpha Test, Blend

Depth Buffer

Colour Buffer
Frame buffer / render targets

Modern GPU Pipeline


Programmable units Vertex shaders, Pixel Shaders DX10 : Geometry Shader


Kill/emit vertices, primitives Ex. displacement mapping, fur, 1-pass render to cube map

Modern GPU Pipeline

Unified shader architecture

Common shading cores shared between Vertex, Geometry and Pixel shading units Scheduler distributes work Load balancing

http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf

http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf

Modern GPU Pipeline

Bandwidth:

Hierarchical Z PS3: Compressed Z and colour to reduce bandwidth for MSAA reads X360: in-GPU EDRAM lots of bandwidth

Modern GPU Pipeline

CUDA / DX11 Compute Shader


Stream processing (GPGPU) Exposes shading functionality Arbitrary memory reads

Modern GPU

More memory, processing units More floating point formats, fewer usage restrictions More render targets (8) Longer shaders New data structures (e.g. Texture arrays) Better MSAA and anisotropic filtering support

Beyond Pipelining

Multi-processor

Solution to memory and power walls Pipelining : multiple stages happening at once Parallelism : many things happening in the same stage Small number of pipeline steps Some steps are much more compute intensive

Limit of pipelining

Parallelism

Parallelism examples:

All components of float4 at the same time Multiple vertices at the same time Multiple triangles at the same time

SIMD

e.g. GPU ALU Shared instruction store and control Compact and less expensive Efficient with no loops or branches Problem with unused processing cycles

Unfilled quads are inefficient Solution : avoid small or skinny triangles (PS3)

Not good for more complicated data structures or algorithms

SIMT

Still SIMD. Shared code between threads. Process groups of primitives (e.g. 48 quads) in each thread Latency hiding:

1 Thread stalls on texture fetch Othe threads continue execution Especially important due to memory wall

SIMT

When branching:

Only evaluate one branch if all primitives take that branch Must evaluate both branches and mask the results if not all primitives take the same branch

Reduces unused processor cycles

MIMD

e.g. Multi-core CPUs, Cell SPEs, Larrabee Diff code stores and controls for diff processors More complex hardware More expensive Synchronization issues Can handle more complex data structures and algorithms

The New Wave

MIMD

Cell SPEs Larrabee

Cell SPEs

SPEs

Local memory store Shared memory accessed via DMA Ring bus

PS3

RSX

Traditional GPU (z-buffer, ROP) SIMD data structures and processing (arrays) Micro triangle removal Skinning Post-FX Lighting Mostly rely on SIMD-friendly data structures

Offload GPU work to SPUs


Larrabee

Many general purpose CPU cores Coherent memory access from cores Very few fixed-function units (e.g. Texture) Most graphics pipeline components are programmable

Depth buffer Blending

Invites more complex data structures and algo

What does this mean?

Programming

GPU programming may become more like SPU programming


More MIMD More synchronization and data buffering issues More attention to latency hiding

Surfaces and Volumes


Curved surfaces Displacement mapping Multi-resolution meshes Volumes

Lighting

Non-uniform representations

Irregular Shadow Mapping Deep Shadow Maps

Rasterization

Object-parallel rasterization

Ray-casting

Implicit surfaces (e.g. Metaballs, Level sets, CSG) Direct volume rendering

Order independent transparency

Questions?

You might also like