Welcome to Scribd!

Hetero Lecture Slides 002 Lecture 1 Lecture-1-8-Kernel-matrix-multiplication

Uploaded by

0% found this document useful (0 votes)

47 views12 pages

Kernel-based Parallel Programming - Basic Matrix-Matrix Multiplication. Each block has 2 2 = 4 threads Have each 2D thread block to compute a (TILE_WIDTH)2 submatrix (tile) of the result matrix.

Original Description:

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Kernel-based Parallel Programming - Basic Matrix-Matrix Multiplication. Each block has 2 2 = 4 threads Have each 2D thread block to compute a (TILE_WIDTH)2 submatrix (tile) of the result matrix.

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

47 views12 pages

Hetero Lecture Slides 002 Lecture 1 Lecture-1-8-Kernel-matrix-multiplication

Uploaded by

BagongJaruh

Kernel-based Parallel Programming - Basic Matrix-Matrix Multiplication. Each block has 2 2 = 4 threads Have each 2D thread block to compute a (TILE_WIDTH)2 submatrix (tile) of the result matrix.

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 12

Search inside document

Lecture 1.

Kernel-based Parallel Programming

- Basic Matrix-Matrix Multiplication

Objective

To prepare for the lab assignment of

matrix-matrix multiplication
Quick review of computation
Block/Thread index to data index
mapping
Loop control flow in kernels

Matrix(-Matrix) Multiplication
A Simple Sequential Code in C
C[Row,Col] is
generated by taking
the inner product of
a row with A (with
index Row) and a
column of B (with
index Col)
A

Col
n
k
C

Row

WIDTH

k
WIDTH

Matrix Multiplication
A Simple Sequential Code in C
void MatrixMulOnHost(int m, int n, int k, float* A, float* B, float* C)
{
for (int Row = 0; Row < m; ++Row)
for (int Col = 0; Col < k; ++Col) {
A
float sum = 0;
Row
m
for (int i = 0; i < n; ++i) {
float a = A[Row * n + i];
float b = B[Col+i*k];
n
sum += a * b;
}
B
n
C[Row * k + Col] = sum;
Col
}
Row
C
}

Col
k

Block(0,0)

Block(0,1)

C0,0

C0,1

C0,2

C1,0

C1,1

C1,2

C2,0

C2,1

C2,2

C3,0

C3,1

C2,3

Block(1,0)

Kernel Function - A Small Example

TILE_WIDTH = 2
Each block has 2*2 = 4 threads

Have each 2D thread block to

compute a (TILE_WIDTH)2 submatrix (tile) of the result matrix
Each has (TILE_WIDTH)2 threads
We use square tiles for simplicity

Block(1,1)

Generate a 2D Grid of blocks

m = 4; n = 4; k = 3
(m-1)/TILE_WIDTH +1= 2
(k-1)/TILE_WIDTH+1 = 2
Use 2* 2 = 4 blocks

Kernel Invocation in Host Code

// Setup the execution configuration
// TILE_WIDTH is a #define constant
dim3 dimGrid((k-1)/TILE_WIDTH+1, (m-1)/TILE_WIDTH+1, 1);
dim3 dimBlock(TILE_WIDTH, TILE_WIDTH, 1);

// Launch the device computation threads!

MatrixMulKernel<<<dimGrid, dimBlock>>>(m, n, k,
d_A, d_B, d_C);

A Simple Matrix Multiplication Kernel

__global__ void MatrixMulKernel(int m, int
n, int k, float* A, float* B, float* C)
{
int Row =
blockIdx.y*blockDim.y+threadIdx.y;
int Col =
blockIdx.x*blockDim.x+threadIdx.x;
if ((Row < m) && (Col < k)) {
float Cvalue = 0.0;
for (int i = 0; i < n; ++i)
Cvalue += A[Row*n+i] * B[Col+i*k];
C[Row*k+Col] = Cvalue;
}
}

Work for Block (0,0)in a TILE_WIDTH = 2

Configuration

m = 4
n = 4
k = 3

Col = 1

Col = 0

Row = 0 * 2 + threadIdx.y
Col = 0 * 2 + threadIdx.x

B0,0 B0,1 B0,2

B1,0 B1,1 B1,2
B2,0 B2,1 B2,2
B3,0 B3,1 B3,2

Row = 0 A0,0 A0,1 A0,2 A0,3

Row = 1 A1,0 A1,1 A1,2 A1,3

C0,0 C0,1 C0,2

A2,0 A2,1 A2,2 A2,3

C2,0 C2,1 C2,2

A3,0 A3,1 A3,2 A3,3

C3,0 C3,1 C3,2

C1,0 C1,1 C1,2

Work for Block (0,1)

B0,0 B0,1 B0,2

B1,0 B1,1 B1,2
B2,0 B2,1 B2,2
B3,0 B3,1 B2,3

Row = 0 A0,0 A0,1 A0,2 A0,3

Row = 1 A1,0 A1,1 A1,2 A1,3

C0,0 C0,1 C0,2

A2,0 A2,1 A2,2 A2,3

C2,0 C2,1 C2,2

A3,0 A3,1 A3,2 A3,3

C3,0 C3,1 C3,2

C0,1 C1,1 C1,2

Col = 3

m = 4
n = 4
k = 3

Col = 2

Row = 0 * 2 + threadIdx.y
Col = 1 * 2 + threadIdx.x

To learn more, read

Section 4.3

A Slightly Bigger Example

P0,0 P0,1 P0,2 P0,3 P0,4 P0,5 P0,6 P0,7
P1,0 P1,1 P1,2 P1,3 P1,4 P1,5 P1,6 P1,7
P2,0 P2,1 P2,2 P2,3 P2,4 P2,5 P2,6 P2,7
P3,0 P3,1 P3,2 P3,3 P3,4 P3,5 P3,6 P3,7

WIDTH = 8; TILE_WIDTH = 2
Each block has 2*2 = 4 threads

P4,0 P4,1 P4,2 P4,3 P4,4 P4,5 P4,6 P4,7

P5,0 P5,1 P5,2 P5,3 P5,4 P5,5 P5,6 P5,7
P6,0 P6,1 P6,2 P6,3 P6,4 P6,5 P6,6 P6,7
P7,0 P7,1 P7,2 P7,3 P7,4 P7,5 P7,6 P7,7

WIDTH/TILE_WIDTH = 4
Use 4* 4 = 16 blocks

A Slightly Bigger Example

(cont.)
P0,0 P0,1 P0,2 P0,3 P0,4 P0,5 P0,6 P0,7
P1,0 P1,1 P1,2 P1,3 P1,4 P1,5 P1,6 P1,7
P2,0 P2,1 P2,2 P2,3 P2,4 P2,5 P2,6 P2,7
P3,0 P3,1 P3,2 P3,3 P3,4 P3,5 P3,6 P3,7
P4,0 P4,1 P4,2 P4,3 P4,4 P4,5 P4,6 P4,7
P5,0 P5,1 P5,2 P5,3 P5,4 P5,5 P5,6 P5,7
P6,0 P6,1 P6,2 P6,3 P6,4 P6,5 P6,6 P6,7
P7,0 P7,1 P7,2 P7,3 P7,4 P7,5 P7,6 P7,7

WIDTH = 8; TILE_WIDTH = 4
Each block has 4*4 =16 threads
WIDTH/TILE_WIDTH = 2
Use 2* 2 = 4 blocks

Tableau Cheat Sheet 25 Feb 2014 P
Document5 pages
Tableau Cheat Sheet 25 Feb 2014 P
jjlorika
100% (1)
Projects With Microcontrollers And PICC
From Everand
Projects With Microcontrollers And PICC
Guillermo Perez Guillen
Rating: 5 out of 5 stars
5/5 (1)
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
Rating: 3 out of 5 stars
3/5 (1)
Pair Trading
Document1 page
Pair Trading
octaveyahoo
100% (1)
U Boot Abcdge
Document232 pages
U Boot Abcdge
Dineshkumarparames
No ratings yet
DIgSILENT TechRef FFT
Document13 pages
DIgSILENT TechRef FFT
Алишер Галиев
No ratings yet
Sathish (SAP SD) Resume
Document4 pages
Sathish (SAP SD) Resume
Sathish Sarika
100% (4)
LTE Theory To Practice-KPI Optimization (A 4G Wireless Technology)
Document20 pages
LTE Theory To Practice-KPI Optimization (A 4G Wireless Technology)
amanagarwal16
No ratings yet
P25 TETRA Network Planning Using EDX SignalPro
Document14 pages
P25 TETRA Network Planning Using EDX SignalPro
micrajacut
No ratings yet
CN Lab Manual
Document51 pages
CN Lab Manual
A Sai Nikhith
No ratings yet
Matrix Mult
Document55 pages
Matrix Mult
sundeepadapa
100% (1)
ICT 4303 20 Apr 2023
Document4 pages
ICT 4303 20 Apr 2023
Gaurav Khandelwal
No ratings yet
CN Lab Manual 23-24
Document48 pages
CN Lab Manual 23-24
sudheerbarri21
No ratings yet
ECE408-S19-ZJUI-exam1-study-guide
Document25 pages
ECE408-S19-ZJUI-exam1-study-guide
Haider ali
No ratings yet
Gate-Level Minimization: M.Shoaib
Document61 pages
Gate-Level Minimization: M.Shoaib
Shoaib Mushtaq
No ratings yet
Cn-Lab-Manual SREYAS
Document51 pages
Cn-Lab-Manual SREYAS
nikksssss666
No ratings yet
Processors
Document25 pages
Processors
chuck212
No ratings yet
New Manual OOPS
Document64 pages
New Manual OOPS
kalpanasripathi
No ratings yet
20 Quiz 14
Document12 pages
20 Quiz 14
demro channel
No ratings yet
ICT 2153 24 Sep 2022
Document9 pages
ICT 2153 24 Sep 2022
Nikhil Vura
No ratings yet
DCCN Programs 123
Document17 pages
DCCN Programs 123
Prakash Raj
No ratings yet
BSC Sem4 Algorithms Assignment 1 Ans
Document11 pages
BSC Sem4 Algorithms Assignment 1 Ans
Mukesh Kumar
No ratings yet
Content of Homework Should Start From This Page Only
Document9 pages
Content of Homework Should Start From This Page Only
Tejinder Pal
No ratings yet
Assignment 13
Document19 pages
Assignment 13
arrow
0% (1)
Acces 2D Array
Document14 pages
Acces 2D Array
Adrian Iosif
No ratings yet
SystemVerilog Lecture 1 Intro
Document62 pages
SystemVerilog Lecture 1 Intro
Shilpa Reddy
No ratings yet
Oops Lab 3 Assignment
Document14 pages
Oops Lab 3 Assignment
Mondal Thakral
No ratings yet
Gradeup data structures and algorithms guide
Document115 pages
Gradeup data structures and algorithms guide
Tinku King
100% (1)
CH 2 Part 2
Document26 pages
CH 2 Part 2
Sam Zee
No ratings yet
Code Dry Run and Error Finding Questions/TITLETITLECode Writing Questions on Loops, Functions and File Handling/TITLE
Document7 pages
Code Dry Run and Error Finding Questions/TITLETITLECode Writing Questions on Loops, Functions and File Handling/TITLE
MUHAMMAD ASHER KHAN
No ratings yet
Assignment 8
Document10 pages
Assignment 8
brahmesh_sm
No ratings yet
DDCA Ch4 VHDL
Document35 pages
DDCA Ch4 VHDL
Yerhard Lalangui Fernández
No ratings yet
Class XII Computer Science: HOTS (High Order Thinking Skill)
Document14 pages
Class XII Computer Science: HOTS (High Order Thinking Skill)
Janaki
No ratings yet
Alpha Technology - Scrypt Analysis On FPGA Proof of Concept
Document8 pages
Alpha Technology - Scrypt Analysis On FPGA Proof of Concept
Kanber Kav
No ratings yet
Sequential Decoding by Stack Algorithm
Document9 pages
Sequential Decoding by Stack Algorithm
Ram Chandram
100% (1)
April 2005 Withsloution
Document21 pages
April 2005 Withsloution
api-3755462
No ratings yet
ddco simp 2024 (1) (1)
Document3 pages
ddco simp 2024 (1) (1)
Praneeth
No ratings yet
STATIC MEMBERS AND METHODS WITH DEFAULT ARGUMENTSTITLEStatic Members and Default Args
Document9 pages
STATIC MEMBERS AND METHODS WITH DEFAULT ARGUMENTSTITLEStatic Members and Default Args
Jayaprakash Manoharan
No ratings yet
Appendix Ii. Image Processing Using Opencv Library
Document11 pages
Appendix Ii. Image Processing Using Opencv Library
vlad
No ratings yet
Functional Coverage Notes, Bins Caluclation, Bins Types
Document25 pages
Functional Coverage Notes, Bins Caluclation, Bins Types
teja
No ratings yet
SystemVerilog Adder Design and Test
Document31 pages
SystemVerilog Adder Design and Test
Mihaela Scanteianu
No ratings yet
C++ Typing
Document9 pages
C++ Typing
A K Abhay
No ratings yet
21-CP-6 (Report12) DLD
Document18 pages
21-CP-6 (Report12) DLD
Almaas Chaudry
No ratings yet
Ver I Log Examples
Document22 pages
Ver I Log Examples
Dayanand Gowda Kr
No ratings yet
NS MANUAL Merged
Document78 pages
NS MANUAL Merged
JEEVA BHARATHI
No ratings yet
BCA
Document76 pages
BCA
Arvinder Singh
No ratings yet
ADA ETCS 351 Lab Manual-1
Document45 pages
ADA ETCS 351 Lab Manual-1
Pratham Kakkar
No ratings yet
VIT University B.Tech ECE Question Paper on VLSI System Design
Document6 pages
VIT University B.Tech ECE Question Paper on VLSI System Design
vlsisiva
No ratings yet
Lab 05
Document5 pages
Lab 05
Mahina
No ratings yet
P. E. S. Institute of Technology Department of Electronics and Communication
Document27 pages
P. E. S. Institute of Technology Department of Electronics and Communication
puttipallavi24
No ratings yet
GPU Programming EE 4702-1 Final Examination: Exam Total
Document10 pages
GPU Programming EE 4702-1 Final Examination: Exam Total
moien
No ratings yet
DF Lab File
Document56 pages
DF Lab File
brij.200410107126
No ratings yet
Cs 60
Document80 pages
Cs 60
Sirsendu Roy
No ratings yet
Computer System Architectures: Based On
Document52 pages
Computer System Architectures: Based On
Ivan Zamora
No ratings yet
Oops Lab Manual
Document41 pages
Oops Lab Manual
silicongmaster7060
No ratings yet
Week 1 Solution
Document13 pages
Week 1 Solution
pba1 pba1
No ratings yet
Ethics Exam 2
Document5 pages
Ethics Exam 2
Joshua Benedict Villanueva
No ratings yet
CS1112 Lecture 19: Array vs. Cell Array
Document6 pages
CS1112 Lecture 19: Array vs. Cell Array
pikachu_latias_latios
No ratings yet
Lab 10 - 12
Document12 pages
Lab 10 - 12
KHALiFA Op
No ratings yet
VHDL - Introduction: CP323L1: Advance Logic Design
Document20 pages
VHDL - Introduction: CP323L1: Advance Logic Design
artnetw0rk
No ratings yet
IT8761 Security Laboratory Manual
Document50 pages
IT8761 Security Laboratory Manual
sowrishal
No ratings yet
CS 455: Software Testing Techniques Sessional Test 3
Document1 page
CS 455: Software Testing Techniques Sessional Test 3
Malikijazriaz
No ratings yet
8 Week Report
Document23 pages
8 Week Report
Siddharth Shukla
No ratings yet
CN Record Manual
Document100 pages
CN Record Manual
nekkantisudheer
No ratings yet
Delhi Public School Navi Mumbai Revision Examination 2016-17. I. Answer The Following Questions
Document10 pages
Delhi Public School Navi Mumbai Revision Examination 2016-17. I. Answer The Following Questions
Shrey Pandit
No ratings yet
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
From Everand
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
Tenko
No ratings yet
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
Hetero Lecture Slides 002 Lecture 1 Lecture-1-5-Cuda-API
Document11 pages
Hetero Lecture Slides 002 Lecture 1 Lecture-1-5-Cuda-API
BagongJaruh
No ratings yet
Hetero Lecture Slides 002 Lecture 1 Lecture-1-7-Kernel-multidimension
Document9 pages
Hetero Lecture Slides 002 Lecture 1 Lecture-1-7-Kernel-multidimension
BagongJaruh
No ratings yet
Hetero Lecture Slides 002 Lecture 1 Lecture 1 4 Cuda Intro
Document12 pages
Hetero Lecture Slides 002 Lecture 1 Lecture 1 4 Cuda Intro
Ady Maryan
No ratings yet
Portability and Scalability in Heterogeneous Parallel Computing
Document11 pages
Portability and Scalability in Heterogeneous Parallel Computing
djc12536
No ratings yet
Hetero Lecture Slides 002 Lecture 1 Lecture-1-6-Cuda-kernel
Document9 pages
Hetero Lecture Slides 002 Lecture 1 Lecture-1-6-Cuda-kernel
BagongJaruh
No ratings yet
Hetero Lecture Slides 002 Lecture 1 Lecture 1 1 Overview
Document14 pages
Hetero Lecture Slides 002 Lecture 1 Lecture 1 1 Overview
Ady Maryan
No ratings yet
Introduction To Heterogeneous Parallel Computing
Document10 pages
Introduction To Heterogeneous Parallel Computing
djc12536
No ratings yet
JHDSS CourseDependencies
Document2 pages
JHDSS CourseDependencies
if05041736
No ratings yet
Design Pattern
Document38 pages
Design Pattern
BagongJaruh
No ratings yet
Hetero Lecture Slides 002 Lecture 1 Lecture 1 1 Overview
Document14 pages
Hetero Lecture Slides 002 Lecture 1 Lecture 1 1 Overview
Ady Maryan
No ratings yet
UniFi Controller V5 UG PDF
Document158 pages
UniFi Controller V5 UG PDF
lemave
No ratings yet
Impact of Hurricane Katrina On Internet Infrastructure
Document9 pages
Impact of Hurricane Katrina On Internet Infrastructure
americana
100% (2)
Hanumanth Hawaldar: Managing Technology Business Development
Document6 pages
Hanumanth Hawaldar: Managing Technology Business Development
Sachin Kaushik
No ratings yet
Macos Mojave Compatibility 02 07
Document11 pages
Macos Mojave Compatibility 02 07
Elnegro Negro
No ratings yet
For 252 Esxi 51 To 65
Document5 pages
For 252 Esxi 51 To 65
mandeepmails
No ratings yet
Bollywood Celebrity Actress Wallpaper For Desktop
Document44 pages
Bollywood Celebrity Actress Wallpaper For Desktop
Sohil Kisan
No ratings yet
Inheritance Polymorphism and Coding Guidelines Content - V1.2
Document6 pages
Inheritance Polymorphism and Coding Guidelines Content - V1.2
wenad
No ratings yet
Sybase Interface Processes Configuration Guide
Document49 pages
Sybase Interface Processes Configuration Guide
gabjones
No ratings yet
Java Programming For Human Beings
Document80 pages
Java Programming For Human Beings
Miguel León Noguera
No ratings yet
Google Robots
Document5 pages
Google Robots
NESTOR HERRERA
No ratings yet
Practical 9 Code Conversion
Document4 pages
Practical 9 Code Conversion
Het Patel
No ratings yet
Bob Gardner's Quick Guide to the TI-89 Calculator
Document3 pages
Bob Gardner's Quick Guide to the TI-89 Calculator
Naomi Linda
No ratings yet
Simplis Syntax
Document36 pages
Simplis Syntax
Mindra Jaya
No ratings yet
Wipro Elite NTH Coding Paper 1
Document6 pages
Wipro Elite NTH Coding Paper 1
Atrayee Gayen
No ratings yet
Oracle-DBA Trainining in Hyderabad
Document4 pages
Oracle-DBA Trainining in Hyderabad
orienit
No ratings yet
GRTrend X
Document30 pages
GRTrend X
Xcap Tebeo
No ratings yet
Macaw Pipeline Defects PDF
Document2 pages
Macaw Pipeline Defects PDF
Mohamed Nouzer
No ratings yet
Tanglewood Case7
Document11 pages
Tanglewood Case7
Abhinay Kumar
No ratings yet
Exception Pdflibexception With Message Handle Parameter Image Has Bad
Document2 pages
Exception Pdflibexception With Message Handle Parameter Image Has Bad
Carlos
No ratings yet
List of Practicals Performed
Document17 pages
List of Practicals Performed
DaVid Silence Kawlni
No ratings yet
Quadratics Flipbook Name
Document2 pages
Quadratics Flipbook Name
api-235426772
100% (1)
Introduction To PayPal For C# - ASP
Document15 pages
Introduction To PayPal For C# - ASP
aspintegration
No ratings yet
Manual Allplan BCM Quantities
Document193 pages
Manual Allplan BCM Quantities
sectiune79
No ratings yet