Professional Documents
Culture Documents
Course Overview
Course Goals
Learn how to program heterogeneous parallel computing systems and achieve high performance and energyefficiency functionality and maintainability scalability across future generations Portability across vendor devices Technical subjects Parallel programming API, tools and techniques principles and patterns of parallel algorithms processor architecture features and constraints
2
People
Instructor:
Wen-mei Hwu w-hwu@illinois.edu, use [Coursera HPP] to start your e-mail subject line
Teaching Assistants:
Abdul Dakkak, Izzat El Hajj, Tom Jablin, Andy Schuh, and community TAs
Contributors
David Kirk, John Stratton, Isaac Gelado, John Stone, Javier Cabezas, Michael Garland, TAs, and many more
3
Web Resources
Website:
Handouts, quizzes, labs, lecture slides/recordings Weekly view vs. classic view Sample textbook chapters, documentation, software resources Electronic announcements
Forum discussions
Forum for Q&A - the community TAs read and answer the postings, and your classmates often have answers
Grading
Quizzes: 50% Weekly, repeatable Labs (Machine Problems): 50% Weekly, with options in later assignments
Recommended Textbook/Notes
1. D. Kirk and W. Hwu, Programming Massively Parallel Processors A Hands-on Approach, 2nd Edition, Morgan Kaufman Publisher, 2013, ISBN 0123814723 Lab assignments will have accompanying instructions and notes NVIDIA, NVidia CUDA C Programming Guide, version 5.0, NVidia (reference book)
7
2.
3.
1/06
7/06
11/06
12/07
2/07
3/07
6/08
8/11
11/12
1/14
10
Tentative Schedule
Week One: Introduction to Heterogeneous Computing, Overview of CUDA C, and Kernel-Based Parallel Programming Lab tour and programming assignment of vector addition in CUDA C Week Two: Memory Model for Locality, Tiling for Conserving Memory Bandwidth, Handling Boundary Conditions, and Performance Considerations Programming assignment of simple matrix-matrix multiplication in CUDA C Week Three: Parallel Convolution Pattern Programming assignment of tiled matrix-matrix multiplication in CUDA C
11
Tentative Schedule
Week Four: Parallel Scan Pattern Programming assignment of parallel convolution in CUDA C. Week Five: Parallel Histogram Pattern and Atomic Operations Programming assignment of parallel scan in CUDA C. Week Six: Data Transfer and Task Parallelism Programming assignment of parallel histogram in CUDA C.
12
Tentative Schedule
Week Seven: Introduction to OpenCL, Introduction to C++AMP, Introduction to OpenACC Programming assignment of vector addition using streams in CUDA C. Week Eight: Course Summary, Other Related Programming Models Thrust, Bolt, and CUDA FORTRAN Programming assignment of simple matrixmatrix multiplication in choice of OpenCL, C++AMP, or OpenACC. Week Nine: Complete any remaining lab assignments, with optional, bonus programming assignments in choice of OpenCL, C++AMP, or OpenACC.
13
Welcome Aboard!