You are on page 1of 13

3D Graphics Content Over OCP

Martti Venell Sr. Verification Engineer Bitboys

Mobile Graphics Trends

Multimedia phone shipments with multimedia accelerator

Graphics Architectures Traditionally

CPU, memory and graphics bus form an own unity connected by the north bridge Graphics processor has dedicated memory Memory bandwidth hungry

CPU Graphics Processor & Memory

North Bridge

RAM

PCI South Bridge

LAN

SCSI

Peripherals

Block Diagram of the Bus Interface

Master OCP Bus System Slave

Arbiter

Geometry Engine

Input

Graphics Core

Key Features: Pipeling

Pipelining in the OCP Specification: the return of read data and the provision of write data may be delayed after the presentation of the associated request Bus latencies can be compensated by sending outstanding requests Compensation of the bus latencies can be achieved with: - Intelligent pre-fetching mechanisms - Smart caching

Key Features: Byte Enables

Transfers of less than a full word of data are supported by providing byte enable information that specifies which octets are to be transferred. When writing 256 bit burst to the color buffer some pixels are wanted to stay as they were Not supported by all bus protocols When writing 256-bit burst of the data, all pixels may not be valid - With BE support full burst can be utilized - Without BE support many different short writes must be done

256 Bit Burst Pixel1 Pixel2 Pixel3 Pixel4 Pixel5 Pixel6 Pixel7 Pixel8

Key Features: Threading Supports concurrency and out-of-order processing of transfers Multiple sources of memory accesses Gives an opportunity for different speed memory fetches Simplifies the design, because return data can be identified with the ID

The SoC System From a 3D Graphics Point of View

Three aspects: memory bandwidth, system bus and the graphics processor - Memory bandwidth sets the performance upper limit for memory intensive scenes (e.g. blending). - Latency compensation drives the configuration of the graphics core within the limits allowed by the system bus. - Computational intensive (e.g. complicated shading) scenes are limited to the graphics processors pipeline efficiency and clock frequency.

Systems Bus Architecture 1/2

Limitations: maximum burst size and the availability of pipelining requests Since the main memory accesses are targeted to the external DRAM it makes sense to make long burst accesses over the system bus Various smart caching schemes are applied to make sure that none of the system bandwidth goes to waste The availability of byte enables in the graphics processors bus interface helps maximize the burst size since there is no need to split potential long bursts into smaller ones

Systems Bus Architecture 2/2

The second issue is the memory latency of the system which includes the total latency from the graphics processor to the external memory and back One efficient way to compensate for this latency is to send multiple outstanding requests to the bus interface By pipelining the requests in this way a continuous data stream can be fed to the memory bandwidth hungry graphics processor Older bus interfaces, such as AMBA AHB 2.0, does not support pipelining or byte enables.

Bus Interface Verification Environment

Two verification environments (Arbiter/Master, Input/Slave)

Arbiter eVC Memory eVC OCP eVC Slave Master Arbiter

Input eVC OCP eVC Master Slave Input

OCP Design and Verification Project Conclusions

Good features - Pipelining - Byte enables Good parameterised approach - Customer dependent configurations - Only necessary features are implemented Simple to design and verify - Cycle from customer requirements to design and verification is short - Fewer bugs due to simplicity

Future Trends in Mobile Graphics Acceleration

Display resolution sizes will be larger and of higher quality This is why graphics processor technology must be scalable not to increase the memory bandwidth too much when display resolutions increase So called immediate mode rendering is used in Bitboys' products and this will be the key issue in terms of memory bandwidth It is also very easy to scale the caches of the graphics processor or the bus architecture OCP 2.0 shows already good features, which are needed in cutting edge multimedia technology

You might also like