You are on page 1of 29

Speed-up of Algorithms With Graphics Processing Units (GPU): Part IV of IV

*#

Robert H. Luke and *# Derek Anderson


*

Electrical and Computer Engineering Department # Predoctoral Fellows, NLM Training Grant

IEEE Computational Intelligence Society MU Chapter And National Library of Medicine Medical Informatics Training Grant Special Seminar Series

Organization of Lectures
Part I
Introduction to GPUs and shader languages

Part II
Image processing (Morphology, Sobel, and Gaussian)

Part III
Performance, multi-pass rendering, optimizations, and debugging

Part IV
Using GPUs for non-image based processing (SOFM & CA)

General Purpose Programming


You already have all the tools required
16/32 bit floating point values 2D Textures of any size [1,8192] Frame Buffer Objects

But a few extra tricks are helpful


Ping Pong Reduction

Ping Pong
The name says it all
Have two textures of the same dimensionality One fragment program Use texture A as input, B as output Then swap; B as input, A as output

Input Output Texture A Fragment Output Program Input

Texture B

Reduction
Use the Ping Pong idea Reduce the size of the texture with each pass Distributes operations over more processors
A+B+C+D E+F+G+H I+J+K+L M+N+O+P

A B C D E I F J G H K L

A+B E+F I+J

C+D G+H K+L

M N O P

M+N O+P

Reduction OpenGL Side


Only draw to half the texture Determine proper index in fragment code texRECT(2.0*(texCoordX-.5), texCoordY) texRECT(2.0*(texCoordY-.5)+1, texCoordY)

SOFM
Clustering technique Used in many settings for generating a codebook Have a 1D/2D/3D space of nodes (Usually 2D) Each map node has dimensionality d Input data is of size nXd
1.3 86.5 4.5 23.9 21.8 28.4 9.4 5.39 34.2 29.1 77.7 73.4 6.5 54.9 3.9 9.7 8.7 5.4 53.8 35.5

46.5

63.9

48.4

99.39

89.1

53.4

25.9

98.7

5.8

75.5

SOFM Nodes n

Input Data d

SOFM Cont.
Go through each input vector
Find node with minimum dist to current input vector Move nearest node closer to input vector Also move the nodes around the winning node closer to the input vector by smaller amount

SOFM Nodes
1.3 4.5

Current Input Vector


21.8 9.4 34.2 77.7 6.5 3.9 8.7 53.8

Move winning node and neighbors closer to input vector

SOFM Results
Have a 2D representation of clusters Neighboring nodes are similar

GPU SOFM
Input Data
Written numbers 0-9 5 samples of each Gray Scale image shrunk to 16x16 for each number : d=256 Texture size for data : 50x256

256 50 Input

GPU SOFM
SOFM Nodes
8x8 space Dimensionality of each node 256 Texture size for data : 8x2046

Need 2 for Ping Pong 2046 8 Input 8 Output 2046

GPU SOFM
Dist to current input vector
Same size as SOFM node texture : 8x2046

Need 2 for reduction

2046 8 Input 8

2046

Output

GPU SOFM
Min dist to current vector texture
Dimensionality : 8x8

Need 2 for reduction

8 8 Input 8

Output

GPU SOFM Algorithm


Determine distance from each node to the input vector Done on a per pixel/index basis

Input Vector Distances SOFM Node Data

SUM The Distances


Sum the distance indexes Distances Reduced Distances

Summation Distances

Find Minimum Distance


Perform min reduction

Distances

Min Reduced Distances

Min Distance

Input Vector

Update Node Data

SOFM Node Data

Distances

Updated SOFM Node Data

Min Distance

And So On
Continue looping through these steps as many time as desired. Be sure to toggle which SOFM Node Data texture is being read from and written to.

Cellular Automata (CA)


Studied within mathematics, theory of computation, pattern recognition, General idea (what we need to know in order to make a GPU program!)
A grid of identical finite state automata whose next state is determined solely by their current state and the state of their neighbors

Increasing Time

Rules Start 1D Grid

Steven Wolfram: A New Kind of Science


We took rules from A New Kind of Science and put them on a GPU implementation of Cellular Automata (1D grid) References
http://mathworld.wolfram.com/CellularAutomaton.html http://www.wolframscience.com/nksonline/toc.html

http://mathworld.wolfram.com/CellularAutomaton.html

Same start state, different rules!

Active Head (red)

CA on a GPU
Basic idea (for the 1D binary case)
Pack the data into an image (one channel, i.e. Red) Use a FBO for multi-pass rendering (fast!)

Initialization
Set the values in the first row of pixels Turn some on (1=black above) and some off (0=green above)

From i=2 to N [N is the number of rows in your image]


Only render row i Fragment program (pixel j row i) Look at the three values below you Left, center, and right Look at the rules and determine your state (on or off)

Selecting a Row to Render!


OpenGL Setup
glViewport(0, 0, (GLsizei) imageWidth, (GLsizei) imageHeight); gluOrtho2D(0.0,imageWidth,0.0,imageHeight);

Row counter
int ca_counter;

Render Code
glPolygonMode(GL_FRONT,GL_FILL); glBegin(GL_QUADS); glTexCoord2i(2,ca_counter-1); glVertex2f(2.0,ca_counter-1); glTexCoord2i(imageWidth-2,ca_counter-1); glVertex2f(imageWidth-2.0,ca_counter-1); glTexCoord2i(imageWidth-2,ca_counter); glVertex2f(imageWidth-2.0,ca_counter); glTexCoord2i(2,ca_counter); glVertex2f(2.0,ca_counter); glEnd();

Fragment Program
void FragmentProgram ( out float4 color0 : COLOR0 , float2 coords : TEXCOORD0 , uniform samplerRECT tex ) { //I use this for the check below half2 tul, tuc, tur, tul2, tur2;

//Calc the image index half2 newindex = coords.xy;

static const half offset = 1.0;

tul = texRECT( tex , newindex + float2(-offset,-offset) ).rg; tuc = texRECT( tex , newindex + float2(0.0,-offset) ).rg; tur = texRECT( tex , newindex + float2(offset,-offset) ).rg;

Fragment Program
if(tuc.r == 1.0){ if( tul.g == 1.0 ){ if( tuc.g == 1.0 ){ if( tur.g == 1.0 ){ //1 1 1 color0 = float4(0.0,0.0,0.0,1.0); }else{ //1 1 0 color0 = float4(0.0,0.0,0.0,1.0); } }else{ if( tur.g == 1.0 ){ //1 0 1 color0 = float4(0.0,1.0,0.0,1.0); }else{ //1 0 0 color0 = float4(0.0,1.0,0.0,1.0); } } }else{ if( tuc.g == 1.0 ){ if( tur.g == 1.0 ){ //0 1 1 color0 = float4(0.0,0.0,0.0,1.0); }else{ //0 1 0 color0 = float4(0.0,0.0,0.0,1.0); } }else{ if( tur.g == 1.0 ){ //0 0 1 color0 = float4(0.0,1.0,0.0,1.0); }else{ //0 0 0 color0 = float4(0.0,1.0,0.0,1.0); } } } }

Fragment Program
else{ if( tul.r == 1.0 ){ tul2 = texRECT( tex , newindex + float2(-2.0*offset,-offset) ).rg; if( (tul2.g == 0.0 && tul.g == 0.0 && tuc.g == 0.0) || (tul2.g == 0.0 && tul.g == 0.0 && tuc.g == 1.0) || (tul2.g == 0.0 && tul.g == 1.0 && tuc.g == 0.0) || (tul2.g == 1.0 && tul.g == 1.0 && tuc.g == 0.0) || (tul2.g == 1.0 && tul.g == 1.0 && tuc.g == 1.0) ){ color0 = float4(1.0,tuc.g,0.0,1.0); }else{ color0 = float4( tuc , 0.0 , 1.0 ); } }else if( tur.r == 1.0 ){ tur2 = texRECT( tex , newindex + float2(2.0*offset,-offset) ).rg; if( (tuc.g == 0.0 && tur.g == 1.0 && tur2.g == 1.0) || (tuc.g == 1.0 && tur.g == 0.0 && tur2.g == 0.0) || (tuc.g == 1.0 && tur.g == 0.0 && tur2.g == 1.0) ){ color0 = float4(1.0,tuc.g,0.0,1.0); }else{ color0 = float4( tuc , 0.0 , 1.0 ); } }else{ color0 = float4( tuc , 0.0 , 1.0 ); } }

Conway's Game of Life


For a space that is populated
Each cell with one or no neighbors dies Each cell with four or more neighbors dies, as if by overpopulation Each cell with two or three neighbors survives

For a space that is 'empty' or 'unpopulated'


Each cell with three neighbors becomes populated

Life on a GPU
Simple GPU program! Use FBOs Render each pixel Sample the neighborhood Like CA, make a decision based on the rules of the game

You might also like