Masterclass

Generative AI with Diffusion Models

This course explores how to use Numba—the just-in-time, type-specializing Python function compiler—to accelerate Python programs to run on massively parallel NVIDIA GPUs. You’ll learn how to use Numba to compile CUDA kernels from NumPy universal functions (ufuncs); use Numba to create and launch custom CUDA kernels; apply key GPU memory management techniques. Upon completion, you’ll be able to use Numba to compile and launch CUDA kernels to accelerate your Python applications on NVIDIA GPUs.

read description ↓

Learning Objectives

At the conclusion of the workshop, you’ll have an understanding of the fundamental tools and techniques for GPU-accelerated Python applications with CUDA and Numba:

  • GPU-accelerate NumPy ufuncs with a few lines of code.
  • Configure code parallelization using the CUDA thread hierarchy.
  • Write custom CUDA device kernels for maximum performance and flexibility.
  • Use memory coalescing and on-device shared memory to increase CUDA kernel bandwidth.

Topics Covered

The following topics and technologies are covered in this course:

  • CUDA Python with Numba
  • CUDA programming general practices

Course Outline

Introduction

  • Meet the instructor.
  • Create an account at https://learn.nvidia.com/join

Introduction to CUDA Python with Numba

  • Begin working with the Numba compiler and CUDA programming in Python.
  • Use Numba decorators to GPU-accelerate numerical Python functions.
  • Optimize host-to-device and device-to-host memory transfers.

Break (60 mins)

Custom CUDA Kernels in Python with Numba

  • Learn CUDA’s parallel thread hierarchy and how to extend parallel program possibilities.
  • Launch massively parallel custom CUDA kernels on the GPU.
  • Utilize CUDA atomic operations to avoid race conditions during parallel execution.

Break (15 mins)

Multidimensional Grids, and Shared Memory for CUDA Python with Numba

  • Learn multidimensional grid creation and how to work in parallel on 2D matrices.
  • Leverage on-device shared memory to promote memory coalescing while reshaping 2D matrices.

Final Review

  • Review key learnings and wrap up questions.
  • Complete the assessment to earn a certificate.
  • Take the workshop survey.

Duration: 08:00

Subject: Generative AI/LLM

Language: English

Course Prerequisites:

  • A basic understanding of Deep Learning Concepts.
  • Familiarity with a Deep Learning framework such as TensorFlow, PyTorch, or Keras. This course uses PyTorch.

Tools, libraries, frameworks used: PyTorch, CLIP

For aspiring AI practitioners, software developers and data scientists

9 July 2025, Berlin

Full-day masterclass

Partner

In-Person NVIDIA Training
Get ready to supercharge your computing skills – join the AI and Accelerated Computing masterclasses led by NVIDIA Deep Learning Institute. This is as special opportunity to learn hands-on from NVIDIA-certified instructors, master cutting-edge parallel programming techniques, and network with fellow engineers and AI developers—all in one high-intensity, lab-driven experience.

Capacity is strictly limited. This workshop is capped to ensure maximum instructor engagement and hands-on support.
Official partner:

Full-Day Masterclass Pass

9 July 2025
Only 30 spots are available
Only 100 spots are available
Get your tickets before prices increase in
mailtimers.com
Regular Price
€800
excl. VAT
Current Price
€509
excl. VAT

Check out other masterclasses

Fundamentals of Deep Learning
Powered by NVIDIA
LEARN MORE
Fundamentals of Accelerated Computing with CUDA Python
Powered by NVIDIA
LEARN MORE
Did you actually listen?
Elisabeth Schlachter
LEARN MORE
Revolutionizing Software Processes with AI
Patrick Schnell
LEARN MORE
How to put GPT LLMs & friends into your applications
Sebastian Gingter
LEARN MORE
Agile Team Leadership 2025
Stefan Mintert
LEARN MORE
Building a Real World Architecture
David Tielke
LEARN MORE