At the PSC3, intensive training programs are provided to teach traditional programmers how to program GPUs
First Training Series on GPU computing were held in March 2013
The following modules were organized.
Module 1: Introduction to GPU programming including methodology, performance analysis and critical review.
Chair: Jan G. Cornelis (Vrije Universiteit Brussel)
14 March 2013
Location: Marconi room Pleinlaan 9 level 2 (parking is available in the basement)
|
Thursday 14 March 2013 |
08.30 |
Registration |
09.00 |
Welcome and introduction Hardware architecture OpenCL |
10.45 |
Coffee break |
11.00 |
Hardware architecture II Execution model |
12.30 |
Lunch |
13.30 |
The philosophy of GPU Computing Power GPU Peak Performance |
15.00 |
Coffee break |
15.15 |
Performance aspects: anti-parallel patterns optimization strategies & tools |
17.00 |
Reception |
|
|
Through a series of lectures, demos and interactive discussions the attendee will be provided with all the necessary tools and techniques for understanding the methodology behind GPU programming.
The lecture starts with an introduction to GPUs, including architecture, programming and execution model. Next, insight into the peak performance will be acquired, as well as the issues that make the performance degenerate. The relation with algorithmic and implementations aspects will be discussed, such that the trainee gets a first insight into which algorithms/implementations are well-suited for GPUs and which aren't.
After the training, the trainees will understand how to implement algorithms on GPUs and which aspects to consider for efficient implementations. We will put the GPU into perspective; compare them with other technologies, and discuss their weaknesses and the challenges for putting the technology into practice.
Participants are expected to have basic knowledge about hardware and software
|
Module 2: Hands-on GPU computing training.
Chair: Jan Lemeire (Vrije Universiteit Brussel) in cooperation with Vincent Hindriksen (Streamcomputing)
18-20 March 2013
Location: Pleinlaan 5 level 0 (parking is available at Pleinlaan 9)
Through a series of personalized practical exercises, trainees will be guided through all implementation and performance issues w.r.t. GPU computing. An effective sequence of examples and exercises have been set-up such that the trainee will follow an effective path towards grasping the necessary skills of GPU computing . The trainee will acquire a thorough understanding in writing GPU kernels, launching kernels, data transfer, kernel synchronization, vector operations, debugging, the available tools, understanding and optimizing memory access, ... One-on-one guidance will be provided, such that the training can be tailored according to the background skills of the trainee.
Participants are expected to have relatively good programming skills (preferably C) at start and that they already acquired the knowledge taught by the introductory training (M1).
Module 3: GPU performance analysis and optimization.
Chair: Jan G. Cornelis (Vrije Universiteit Brussel) with invited speaker (To be announced)
28 March 2013
Location: VUB campus Etterbeek, entrance 8, building Ke 2.24 (take main entrance of building K, parking on the VUB campus is possible, an entrance code will be provided)
|
Thursday 28 March 2013 |
08.30 |
Registration |
09.00 |
The GPU architecture Peak performance with the roofline model |
10.45 |
Coffee break |
11.00 |
The execution profile and classification of inefficiencies Performance tools |
12.30 |
Lunch |
13.30 |
Approach to analyse the performance of a given GPU implementation |
15.00 |
Coffee break |
15.15 |
Case studies |
17.00 |
Reception |
|
|
This one day program will dig deeper into the issues having impact on the overall performance of a GPU implementation. A performance analysis intends to reveal how and how much implementation patterns lead to inefficient usage.
The impact on performance is demonstrated and analyzed with examples and benchmark kernels. An analytical model is used to estimate the impact of each inefficiency pattern. Solutions are discussed to overcome these inefficiencies.
Optionally, the trainee can bring algorithms and/or implementations that will be analyzed during this session.
|
Module 4: Multi-core programming with pThreads (C/C++) and/or java threads.
Chair: Jan Lemeire (Vrije Universiteit Brussel)
4 April 2013
Location: Marconi room Pleinlaan 9 level 2 (parking is available in the basement)
|
Thursday 4 April 2013 |
08.30 |
Registration |
09.00 |
Multicore architecture 3 multithread-primitives in pThreads and java |
10.45 |
Coffee break |
11.00 |
Generating threads |
12.30 |
Lunch |
13.30 |
Solving race conditions |
15.00 |
Coffee break |
15.15 |
Thread synchronization |
17.00 |
Reception |
|
|
During this day, a thorough understanding into multithreaded programming is provided. The basic functionalities are explained: thread creation, locking of critical sections and thread synchronization. Insights are provided into the standard POSIX library (also called pThreads) and the built-in functionalities of the java language. The conceptual difference between both is also outlined. With several examples it is shown how one builds thread-safe multi-threaded programs.
Next, more advanced constructs are analyzed that one needs to build to make correct multithreaded programs. The trainee will learn that some intricate issues might arise that have to be handled very carefully.
Examples and demo programs will be provided with which the trainee can play and experiment.
Participants are expected to have relatively good programming skills (preferably C and/or java.
|
|