BobAndDerek GPU Part4

Speed-up of Algorithms With Graphics Processing Units (GPU): Part IV of IV
*#
Robert H. Luke and *# Derek Anderson

*
Electrical and Computer Engineering Department # Predoctoral Fellows, NLM Training Grant
IEEE Computational Intelligence Society MU Chapter And National Library of Medicine Medical Informatics Training Grant Special Seminar Series
Organization of Lectures
Part I
Introduction to GPUs and shader languages
Part II
Image processing (Morphology, Sobel, and Gaussian)
Part III
Performance, multi-pass rendering, optimizations, and debugging
Part IV
Using GPUs for non-image based processing (SOFM & CA)
General Purpose Programming

You already have all the tools required
16/32 bit floating point values 2D Textures of any size [1,8192] Frame Buffer Objects
But a few extra tricks are helpful

Ping Pong Reduction
Ping Pong
The name says it all
Have two textures of the same dimensionality One fragment program Use texture A as input, B as output Then swap; B as input, A as output
Input Output Texture A Fragment Output Program Input
Texture B
Reduction
Use the Ping Pong idea Reduce the size of the texture with each pass Distributes operations over more processors
A+B+C+D E+F+G+H I+J+K+L M+N+O+P
A B C D E I F J G H K L
A+B E+F I+J
C+D G+H K+L
M N O P
M+N O+P
Reduction OpenGL Side

Only draw to half the texture Determine proper index in fragment code texRECT(2.0*(texCoordX-.5), texCoordY) texRECT(2.0*(texCoordY-.5)+1, texCoordY)
SOFM
Clustering technique Used in many settings for generating a codebook Have a 1D/2D/3D space of nodes (Usually 2D) Each map node has dimensionality d Input data is of size nXd
1.3 86.5 4.5 23.9 21.8 28.4 9.4 5.39 34.2 29.1 77.7 73.4 6.5 54.9 3.9 9.7 8.7 5.4 53.8 35.5
46.5
63.9
48.4
99.39
89.1
53.4
25.9
98.7
5.8
75.5
SOFM Nodes n
Input Data d
SOFM Cont.
Go through each input vector
Find node with minimum dist to current input vector Move nearest node closer to input vector Also move the nodes around the winning node closer to the input vector by smaller amount
SOFM Nodes
1.3 4.5
Current Input Vector

21.8 9.4 34.2 77.7 6.5 3.9 8.7 53.8
Move winning node and neighbors closer to input vector
SOFM Results
Have a 2D representation of clusters Neighboring nodes are similar
GPU SOFM
Input Data
Written numbers 0-9 5 samples of each Gray Scale image shrunk to 16x16 for each number : d=256 Texture size for data : 50x256
256 50 Input
GPU SOFM
SOFM Nodes
8x8 space Dimensionality of each node 256 Texture size for data : 8x2046
Need 2 for Ping Pong 2046 8 Input 8 Output 2046
GPU SOFM
Dist to current input vector
Same size as SOFM node texture : 8x2046
Need 2 for reduction
2046 8 Input 8
2046
Output
GPU SOFM
Min dist to current vector texture
Dimensionality : 8x8
Need 2 for reduction
8 8 Input 8
Output
GPU SOFM Algorithm

Determine distance from each node to the input vector Done on a per pixel/index basis
Input Vector Distances SOFM Node Data
SUM The Distances

Sum the distance indexes Distances Reduced Distances
Summation Distances
Find Minimum Distance

Perform min reduction
Distances
Min Reduced Distances
Min Distance
Input Vector
Update Node Data
SOFM Node Data
Distances
Updated SOFM Node Data
Min Distance
And So On
Continue looping through these steps as many time as desired. Be sure to toggle which SOFM Node Data texture is being read from and written to.
Cellular Automata (CA)

Studied within mathematics, theory of computation, pattern recognition, General idea (what we need to know in order to make a GPU program!)
A grid of identical finite state automata whose next state is determined solely by their current state and the state of their neighbors
Increasing Time
Rules Start 1D Grid
Steven Wolfram: A New Kind of Science

We took rules from A New Kind of Science and put them on a GPU implementation of Cellular Automata (1D grid) References
http://mathworld.wolfram.com/CellularAutomaton.html http://www.wolframscience.com/nksonline/toc.html
http://mathworld.wolfram.com/CellularAutomaton.html
Same start state, different rules!
Active Head (red)
CA on a GPU
Basic idea (for the 1D binary case)
Pack the data into an image (one channel, i.e. Red) Use a FBO for multi-pass rendering (fast!)
Initialization
Set the values in the first row of pixels Turn some on (1=black above) and some off (0=green above)
From i=2 to N [N is the number of rows in your image]

Only render row i Fragment program (pixel j row i) Look at the three values below you Left, center, and right Look at the rules and determine your state (on or off)
Selecting a Row to Render!

OpenGL Setup
glViewport(0, 0, (GLsizei) imageWidth, (GLsizei) imageHeight); gluOrtho2D(0.0,imageWidth,0.0,imageHeight);
Row counter
int ca_counter;
Render Code
glPolygonMode(GL_FRONT,GL_FILL); glBegin(GL_QUADS); glTexCoord2i(2,ca_counter-1); glVertex2f(2.0,ca_counter-1); glTexCoord2i(imageWidth-2,ca_counter-1); glVertex2f(imageWidth-2.0,ca_counter-1); glTexCoord2i(imageWidth-2,ca_counter); glVertex2f(imageWidth-2.0,ca_counter); glTexCoord2i(2,ca_counter); glVertex2f(2.0,ca_counter); glEnd();
Fragment Program
void FragmentProgram ( out float4 color0 : COLOR0 , float2 coords : TEXCOORD0 , uniform samplerRECT tex ) { //I use this for the check below half2 tul, tuc, tur, tul2, tur2;
//Calc the image index half2 newindex = coords.xy;
static const half offset = 1.0;
tul = texRECT( tex , newindex + float2(-offset,-offset) ).rg; tuc = texRECT( tex , newindex + float2(0.0,-offset) ).rg; tur = texRECT( tex , newindex + float2(offset,-offset) ).rg;
Fragment Program
if(tuc.r == 1.0){ if( tul.g == 1.0 ){ if( tuc.g == 1.0 ){ if( tur.g == 1.0 ){ //1 1 1 color0 = float4(0.0,0.0,0.0,1.0); }else{ //1 1 0 color0 = float4(0.0,0.0,0.0,1.0); } }else{ if( tur.g == 1.0 ){ //1 0 1 color0 = float4(0.0,1.0,0.0,1.0); }else{ //1 0 0 color0 = float4(0.0,1.0,0.0,1.0); } } }else{ if( tuc.g == 1.0 ){ if( tur.g == 1.0 ){ //0 1 1 color0 = float4(0.0,0.0,0.0,1.0); }else{ //0 1 0 color0 = float4(0.0,0.0,0.0,1.0); } }else{ if( tur.g == 1.0 ){ //0 0 1 color0 = float4(0.0,1.0,0.0,1.0); }else{ //0 0 0 color0 = float4(0.0,1.0,0.0,1.0); } } } }
Fragment Program
else{ if( tul.r == 1.0 ){ tul2 = texRECT( tex , newindex + float2(-2.0*offset,-offset) ).rg; if( (tul2.g == 0.0 && tul.g == 0.0 && tuc.g == 0.0) || (tul2.g == 0.0 && tul.g == 0.0 && tuc.g == 1.0) || (tul2.g == 0.0 && tul.g == 1.0 && tuc.g == 0.0) || (tul2.g == 1.0 && tul.g == 1.0 && tuc.g == 0.0) || (tul2.g == 1.0 && tul.g == 1.0 && tuc.g == 1.0) ){ color0 = float4(1.0,tuc.g,0.0,1.0); }else{ color0 = float4( tuc , 0.0 , 1.0 ); } }else if( tur.r == 1.0 ){ tur2 = texRECT( tex , newindex + float2(2.0*offset,-offset) ).rg; if( (tuc.g == 0.0 && tur.g == 1.0 && tur2.g == 1.0) || (tuc.g == 1.0 && tur.g == 0.0 && tur2.g == 0.0) || (tuc.g == 1.0 && tur.g == 0.0 && tur2.g == 1.0) ){ color0 = float4(1.0,tuc.g,0.0,1.0); }else{ color0 = float4( tuc , 0.0 , 1.0 ); } }else{ color0 = float4( tuc , 0.0 , 1.0 ); } }
Conway's Game of Life

For a space that is populated
Each cell with one or no neighbors dies Each cell with four or more neighbors dies, as if by overpopulation Each cell with two or three neighbors survives
For a space that is 'empty' or 'unpopulated'

Each cell with three neighbors becomes populated
Life on a GPU
Simple GPU program! Use FBOs Render each pixel Sample the neighborhood Like CA, make a decision based on the rules of the game

BobAndDerek GPU Part4

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BobAndDerek GPU Part4

Uploaded by

Copyright:

Available Formats

Speed-up of Algorithms With Graphics Processing Units (GPU): Part IV of IV

Robert H. Luke and *# Derek Anderson

General Purpose Programming

But a few extra tricks are helpful

Input Output Texture A Fragment Output Program Input

A+B E+F I+J

C+D G+H K+L

Reduction OpenGL Side

Current Input Vector

Move winning node and neighbors closer to input vector

Need 2 for Ping Pong 2046 8 Input 8 Output 2046

Need 2 for reduction

Need 2 for reduction

GPU SOFM Algorithm

Input Vector Distances SOFM Node Data

SUM The Distances

Find Minimum Distance

Min Reduced Distances

Update Node Data

SOFM Node Data

Updated SOFM Node Data

Cellular Automata (CA)

Rules Start 1D Grid

Steven Wolfram: A New Kind of Science

Same start state, different rules!

Active Head (red)

From i=2 to N [N is the number of rows in your image]

Selecting a Row to Render!

//Calc the image index half2 newindex = coords.xy;

static const half offset = 1.0;

Conway's Game of Life

For a space that is 'empty' or 'unpopulated'

You might also like