Welcome to Scribd!

Hetero Lecture Slides 002 Lecture 1 Lecture-1-6-Cuda-kernel

Uploaded by

0% found this document useful (0 votes)

10 views9 pages

A simple CUDA kernel function can be used to Run a simple program. A kernel function must return void __device__ and __host__ can be used together. Each thread performs one pair-wise addition.

Original Description:

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

A simple CUDA kernel function can be used to Run a simple program. A kernel function must return void __device__ and __host__ can be used together. Each thread performs one pair-wise addition.

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

10 views9 pages

Hetero Lecture Slides 002 Lecture 1 Lecture-1-6-Cuda-kernel

Uploaded by

BagongJaruh

A simple CUDA kernel function can be used to Run a simple program. A kernel function must return void __device__ and __host__ can be used together. Each thread performs one pair-wise addition.

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 9

Search inside document

Lecture 1.

Introduction to CUDA
- Kernel-Based SPMD Parallel Programming

Objective

To
in

learn the basic concepts involved

a simple CUDA kernel function
Declaration
Built-in variables
Thread index to data index mapping

Example: Vector Addition Kernel

Device Code
// Compute vector sum C = A+B
// Each thread performs one pair-wise addition

__global__
void vecAddKernel(float* A, float* B, float*
C, int n)
{
int i = threadIdx.x+blockDim.x*blockIdx.x;
if(i<n) C[i] = A[i] + B[i];
}
3

Example: Vector Addition Kernel (Host Code)

int vecAdd(float* h_A, float* h_B, float* h_C, int n)
{
// d_A, d_B, d_C allocations and copies omitted
Host Code
// Run ceil(n/256.0) blocks of 256 threads each
vecAddKernnel<<<ceil(n/256.0),256>>>(d_A, d_B, d_C, n);
}

More on Kernel Launch (Host Code)

int vecAdd(float* h_A, float* h_B, float* h_C, int n)
{
Host Code
dim3 DimGrid((n-1)/256 + 1, 1, 1);
dim3 DimBlock(256, 1, 1);
vecAddKernnel<<<DimGrid,DimBlock>>>(d_A, d_B, d_C, n);
}
This makes sure that there are
enough threads to cover all
elements.

Kernel execution in a
nutshell

__host__
Void vecAdd()
{
dim3 DimGrid(ceil(n/256.0),1,1);
dim3 DimBlock(256,1,1);
vecAddKernel<<<DimGrid,DimBlock>>>(d_A,d_B
,d_C,n);
}

__global__
void vecAddKernel(float *A,
float *B, float *C, int n)
{
int i = blockIdx.x * blockDim.x
+ threadIdx.x;
if( i<n ) C[i] = A[i]+B[i];
}

Blk 0

Grid

Blk N-1

GPU

RAM

device float DeviceFunc()

__global__ void
__host__

KernelFunc()

float HostFunc()

Only callable from

the:
device

device

host

global defines a kernel function

Executed on
the:
device

Each __ consists of two underscore

characters
A kernel function must return void

device and host can be used

together
__host__ is optional if used alone
7

Compiling A CUDA Program

Integrated C programs with CUDA extensions

NVCC Compiler
Host Code
Host C Compiler/
Linker

Device Code (PTX)

Device Just-in-Time
Compiler

Heterogeneous Computing Platform with

CPUs, GPUs, etc.
8

To learn more, please read

Chapter 3.

217 Lec2
Document24 pages
217 Lec2
palash
No ratings yet
Introduction To CUDA: CAP 4730 Spring 2012
Document35 pages
Introduction To CUDA: CAP 4730 Spring 2012
Manvendra Singh Chhajerh
No ratings yet
Gpu History and Cuda Programming Basics
Document44 pages
Gpu History and Cuda Programming Basics
Fransiskus Yoga Esa Wibowo
No ratings yet
Part2 22
Document97 pages
Part2 22
kylefan15
No ratings yet
Cuda Examples
Document28 pages
Cuda Examples
Riccardo Ferrero
No ratings yet
Module 3.1 - CUDA Parallelism Model: GPU Teaching Kit
Document44 pages
Module 3.1 - CUDA Parallelism Model: GPU Teaching Kit
yassin mechbal
No ratings yet
Lecture 11 Programming On Gpus Part 1 Zxu2acms60212 40212 S15lec 11 Gpupdf
Document121 pages
Lecture 11 Programming On Gpus Part 1 Zxu2acms60212 40212 S15lec 11 Gpupdf
eipu tu
No ratings yet
GPU Programming: CUDA
Document29 pages
GPU Programming: CUDA
Milagros Vega
No ratings yet
Class4 Advanced Cuda Opencl
Document64 pages
Class4 Advanced Cuda Opencl
Carlangaslangas
No ratings yet
Josh Cuda
Document27 pages
Josh Cuda
Ramu
No ratings yet
CUDA Programming Basic: High Performance Computing Center Hanoi University of Science & Technology
Document38 pages
CUDA Programming Basic: High Performance Computing Center Hanoi University of Science & Technology
Mato Nguyễn
No ratings yet
Csnb594csnb4423 Lab 5 01a Harveen Velan Sw0104101
Document19 pages
Csnb594csnb4423 Lab 5 01a Harveen Velan Sw0104101
Harveen Velan
No ratings yet
LP 1,,1
Document5 pages
LP 1,,1
onkarbabhale69
No ratings yet
Matrix Mult
Document55 pages
Matrix Mult
sundeepadapa
100% (1)
Cuda Cont
Document29 pages
Cuda Cont
unknown0801
No ratings yet
Programming in Java Laboratory File
Document97 pages
Programming in Java Laboratory File
Aakash Raj
50% (2)
BCS3413 Principle & Applications of Parallel Programming Quiz 2: Gpgpu Cuda
Document3 pages
BCS3413 Principle & Applications of Parallel Programming Quiz 2: Gpgpu Cuda
amin minshaf
No ratings yet
Lecture 30 GPU Programming Loop Parallelism
Document16 pages
Lecture 30 GPU Programming Loop Parallelism
Udai Valluru
No ratings yet
Intro 2 Cuda
Document30 pages
Intro 2 Cuda
ab c
No ratings yet
Recipe For Running Simple CUDA Code On A GPU Based Rocks Cluster
Document17 pages
Recipe For Running Simple CUDA Code On A GPU Based Rocks Cluster
proxymo1
No ratings yet
GPU Programming EE 4702-1 Final Examination: Exam Total
Document10 pages
GPU Programming EE 4702-1 Final Examination: Exam Total
moien
No ratings yet
Hetero Lecture Slides 002 Lecture 1 Lecture-1-5-Cuda-API
Document11 pages
Hetero Lecture Slides 002 Lecture 1 Lecture-1-5-Cuda-API
BagongJaruh
No ratings yet
TP1: Converting Vector Addition To CUDA
Document1 page
TP1: Converting Vector Addition To CUDA
Meryem RAHMOUNI
No ratings yet
CUDA Programming: Johan Seland Johan - Seland@sintef - No
Document76 pages
CUDA Programming: Johan Seland Johan - Seland@sintef - No
Lilly Cheerotha
No ratings yet
20 Quiz 14
Document12 pages
20 Quiz 14
demro channel
No ratings yet
CUDA
Document33 pages
CUDA
ravish177
No ratings yet
ECE408 S19 ZJUI Exam1 Study Guide
Document25 pages
ECE408 S19 ZJUI Exam1 Study Guide
Haider ali
No ratings yet
Discussion Questions 5
Document2 pages
Discussion Questions 5
Sudip Adhikari
No ratings yet
Cuda Firstprograms PDF
Document6 pages
Cuda Firstprograms PDF
Francesco Lombardi
No ratings yet
Group A Assignment 4 (A) : Two Large Vectors
Document5 pages
Group A Assignment 4 (A) : Two Large Vectors
Nayan Jadhav
No ratings yet
Final
Document4 pages
Final
Apinya SUTHISOPHAARPORN
No ratings yet
8 Cud A 1
Document38 pages
8 Cud A 1
Aashish Mittal
No ratings yet
Cuda
Document44 pages
Cuda
avinash kumar
No ratings yet
Processors
Document25 pages
Processors
chuck212
No ratings yet
EE-721: Lec2
Document25 pages
EE-721: Lec2
Raja Ram
No ratings yet
Cuda C/C++ Basics: NVIDIA Corporation
Document67 pages
Cuda C/C++ Basics: NVIDIA Corporation
rj j
No ratings yet
周03
Document65 pages
周03
sy1990010111
No ratings yet
Coursera Quiz Week1 Spring 2014 Heterogeneous Programming
Document4 pages
Coursera Quiz Week1 Spring 2014 Heterogeneous Programming
mrclueless
100% (5)
Chapter 4:: Hardware Description Languages: Digital Design and Computer Architecture
Document33 pages
Chapter 4:: Hardware Description Languages: Digital Design and Computer Architecture
Sattar Chandra Sekhar
No ratings yet
Linkedin-Skill-Assessments-Quizzes-1 - C++quiz - MD at Master Mauriziomarini - Linkedin-Skill-Assessments-Quizzes-1 GitHub
Document28 pages
Linkedin-Skill-Assessments-Quizzes-1 - C++quiz - MD at Master Mauriziomarini - Linkedin-Skill-Assessments-Quizzes-1 GitHub
Ankit Raj
No ratings yet
21-CP-6 (Report12) DLD
Document18 pages
21-CP-6 (Report12) DLD
Almaas Chaudry
No ratings yet
Cuda Talk
Document82 pages
Cuda Talk
Kevin Salmeron Vicente
100% (1)
Verilog Tutorial For Beginner's
Document65 pages
Verilog Tutorial For Beginner's
Sachin Sreelal
No ratings yet
Intro To HDL Midterm
Document112 pages
Intro To HDL Midterm
chienikolao
No ratings yet
L2: Introduction To Cuda: January 14, 2009
Document27 pages
L2: Introduction To Cuda: January 14, 2009
JM Mejia
No ratings yet
C++ Multithreading
Document7 pages
C++ Multithreading
Kobina
No ratings yet
Hetero Lecture Slides 002 Lecture 1 Lecture-1-8-Kernel-matrix-multiplication
Document12 pages
Hetero Lecture Slides 002 Lecture 1 Lecture-1-8-Kernel-matrix-multiplication
BagongJaruh
No ratings yet
Lecture12 GPUArchCUDA02-CUDAMem
Document67 pages
Lecture12 GPUArchCUDA02-CUDAMem
Michelle Saver
No ratings yet
Cs1010 Model Key
Document18 pages
Cs1010 Model Key
Kumar Mohan
No ratings yet
ObjcIntro Handouts
Document26 pages
ObjcIntro Handouts
subhapamkundu
No ratings yet
An Introduction To PyCUDA Using Prefix Sum Algorithm PDF
Document6 pages
An Introduction To PyCUDA Using Prefix Sum Algorithm PDF
jackops
No ratings yet
Pgi Cuda Tutorial
Document58 pages
Pgi Cuda Tutorial
Maziar Irani
No ratings yet
cs341 Proj2 20160920
Document42 pages
cs341 Proj2 20160920
Александр Мясников
No ratings yet
Module 3 Quiz
Document2 pages
Module 3 Quiz
sy1990010111
No ratings yet
Parallel Programming in Opencl: Advanced Graphics & Image Processing
Document31 pages
Parallel Programming in Opencl: Advanced Graphics & Image Processing
venkata pullarao matta
No ratings yet
C# Test
Document10 pages
C# Test
Shilpa Kulkarni
No ratings yet
Computer Science 2002 Question Paper
Document5 pages
Computer Science 2002 Question Paper
baluchandrashekar2008
No ratings yet
12 Computer Science Lyp 2015 Delhi
Document55 pages
12 Computer Science Lyp 2015 Delhi
ASDF
No ratings yet
LPIC-1 Primer
From Everand
LPIC-1 Primer
John Greene
Rating: 4.5 out of 5 stars
4.5/5 (3)
Comptia Network+ Primer
From Everand
Comptia Network+ Primer
John Greene
No ratings yet
This Study Resource Was
Document2 pages
This Study Resource Was
PARDEEP KUMAR
No ratings yet
IT Project Manager / Web Developer / Pentester: Page 1 of 5
Document5 pages
IT Project Manager / Web Developer / Pentester: Page 1 of 5
Lili Sopiandi
No ratings yet
Working With Menus in Visual Basic 6 (VB6) : An Expanded Menu Editor Window
Document5 pages
Working With Menus in Visual Basic 6 (VB6) : An Expanded Menu Editor Window
Prince Yadav
No ratings yet
04 Decision Structures and Boolean Logic
Document30 pages
04 Decision Structures and Boolean Logic
DrAshish Vashistha
No ratings yet
Problem Solving and Programming With: Python
Document37 pages
Problem Solving and Programming With: Python
vasudevanp755
No ratings yet
TM210 Working With Automation Studio
Document42 pages
TM210 Working With Automation Studio
Dimas Prasetyo Utomo
100% (1)
Using JavaScript in ADF Faces Rich
Document54 pages
Using JavaScript in ADF Faces Rich
mdsaghi
No ratings yet
FULL Download Ebook PDF Introduction To Java Programming and Data Structures Comprehensive Version 11 PDF Ebook
Document47 pages
FULL Download Ebook PDF Introduction To Java Programming and Data Structures Comprehensive Version 11 PDF Ebook
elena.connolly926
100% (36)
What Is Java Technology and Why Do I Need It?
Document2 pages
What Is Java Technology and Why Do I Need It?
bigboy 4990
No ratings yet
2.2 Day12-Locators PDF
Document27 pages
2.2 Day12-Locators PDF
Qutab Kumar Patnaik
No ratings yet
c15 Assignment
Document7 pages
c15 Assignment
bookworm_shashi
No ratings yet
C Programming E Books
Document69 pages
C Programming E Books
Subhash Chandra Sahu
No ratings yet
CSCI250 - Exam 2 - V1-Solution
Document6 pages
CSCI250 - Exam 2 - V1-Solution
Ruba Hassan
No ratings yet
SQA Basics
Document18 pages
SQA Basics
oswald christopher
No ratings yet
Trace - 2023-06-09 10 - 40 - 21 008
Document2 pages
Trace - 2023-06-09 10 - 40 - 21 008
Bongchea kh2019
No ratings yet
Javascript On December 4, 1995, Netscape and Sun Inc. Jointly Introduced
Document11 pages
Javascript On December 4, 1995, Netscape and Sun Inc. Jointly Introduced
21BCA19 KEZIAH IRENE A
No ratings yet
Treasure of Shortkeys of Computer
Document7 pages
Treasure of Shortkeys of Computer
mahindia
No ratings yet
Advantages of Object-Oriented Architecture
Document18 pages
Advantages of Object-Oriented Architecture
sara
No ratings yet
Issue149 en
Document57 pages
Issue149 en
Nick Pola
No ratings yet
Python Tutorial 27
Document134 pages
Python Tutorial 27
Emilo González González
100% (1)
Service Log
Document8 pages
Service Log
vikri putra
No ratings yet
Resume 108
Document2 pages
Resume 108
Sinkar Manoj
No ratings yet
Illustrate The Umbrella Activities of A Software Process
Document12 pages
Illustrate The Umbrella Activities of A Software Process
Gautham Saju
No ratings yet
The ITK Software Guide Third Edition
Document768 pages
The ITK Software Guide Third Edition
McDaryl
No ratings yet
Regular Expressions With C#
Document12 pages
Regular Expressions With C#
Agnaldo
100% (2)
Utility Classes Developer Guide v14.09
Document23 pages
Utility Classes Developer Guide v14.09
nstomar
No ratings yet
Python Programming Lecture Modules
Document17 pages
Python Programming Lecture Modules
onesnone
No ratings yet
Examples
Document518 pages
Examples
Duško Tovilović
No ratings yet
Marley Spoon Additional Information On Teams
Document2 pages
Marley Spoon Additional Information On Teams
Ne
No ratings yet
Sedex Analytics Guide
Document15 pages
Sedex Analytics Guide
EMS 4AYD
No ratings yet