Andy Terrel

Feb 5, 2025 • WeAreDevelopers LIVE

CUDA in Python

What if your Python code could achieve over 90% of a GPU's theoretical max performance? Learn how NVIDIA is making it possible.

#1about 6 minutes

Understanding the CUDA platform stack for Python developers

The CUDA platform is layered from high-level domain libraries to low-level hardware access, with new tools aiming to combine Python's productivity with GPU performance.

#2about 3 minutes

Improving performance by fusing GPU operations

The nvmath-python library enables kernel fusion using epilogues, which combines multiple operations like matrix multiplication and bias addition into a single GPU kernel launch.

#3about 5 minutes

Calling device-side functions directly from Python kernels

Python kernels can now directly call pre-compiled, high-performance device-side functions from libraries like cuBLAS, enabled by a just-in-time linker called nvJitLink.

#4about 2 minutes

Fine-grained parallelism with cooperative groups in Python

The CUB library is exposed to Python, allowing for cooperative operations and reductions at the block or warp level for fine-grained control over GPU parallelism.

#5about 3 minutes

Accelerating language support with numba-cuda and nupack

The numba-cuda module is separated to accelerate feature delivery, while nupack automatically generates Python bindings for C++ templated code.

#6about 4 minutes

A Pythonic object model for host-side GPU control

A new high-level object model allows Python developers to directly manage GPU resources like devices, contexts, streams, and linker objects without boilerplate code.

Bitpanda
Vienna, Austria

Senior

Python

Microsoft SQL Server

+2

envelio
Köln, Germany

Remote

Senior

Python

JavaScript

+1

Almedia
Berlin, Germany

Intermediate

Senior

Python

PostgreSQL

+3

Boosting Python performance with the Nvidia CUDA ecosystem

05:12 MIN

Boosting Python performance with the Nvidia CUDA ecosystem

The weekly developer show: Boosting Python with CUDA, CSS Updates & Navigating New Tech Stacks

Navigating the CUDA Python software ecosystem

02:28 MIN

Navigating the CUDA Python software ecosystem

Accelerating Python on GPUs

The evolution of GPU programming with Python

01:07 MIN

The evolution of GPU programming with Python

Accelerating Python on GPUs

Understanding CUDA as a complete computing platform

02:34 MIN

Understanding CUDA as a complete computing platform

Coffee with Developers - Stephen Jones - NVIDIA

Introducing the CUDA parallel computing platform

06:37 MIN

Introducing the CUDA parallel computing platform

Accelerating Python on GPUs

A progressive approach to programming GPUs in Python

10:18 MIN

A progressive approach to programming GPUs in Python

Accelerating Python on GPUs

A look at upcoming Python GPU programming tools

01:33 MIN

A look at upcoming Python GPU programming tools

Accelerating Python on GPUs

Using NVIDIA libraries to easily accelerate applications

04:05 MIN

Using NVIDIA libraries to easily accelerate applications

WWC24 - Ankit Patel - Unlocking the Future Breakthrough Application Performance and Capabilities with NVIDIA

Featured Partners

Accelerating Python on GPUs

Accelerating Python on GPUs

Paul Graham

about 2 years ago • WeAreDevelopers LIVE

Python: Behind the Scenes

Python: Behind the Scenes

Diana Gastrin

about 3 years ago • World Congress 2023

Vectorize all the things! Using linear algebra and NumPy to make your Python code lightning fast.

Vectorize all the things! Using linear algebra and NumPy to make your Python code lightning fast.

Jodie Burchell

about 3 years ago • WeAreDevelopers LIVE

Concurrency in Python

Concurrency in Python

Fabian Schindler

about 3 years ago • WeAreDevelopers LIVE

Overview of Machine Learning in Python

Overview of Machine Learning in Python

Adrian Schmitt

about 2 years ago • WeAreDevelopers LIVE

Python-Based Data Streaming Pipelines Within Minutes

Python-Based Data Streaming Pipelines Within Minutes

Bobur Umurzokov

about a year ago • WeAreDevelopers LIVE

30 Golden Rules of Deep Learning Performance

30 Golden Rules of Deep Learning Performance

Anirudh Koul

about 6 years ago • WeAreDevelopers LIVE

A beginner’s guide to modern natural language processing

A beginner’s guide to modern natural language processing

Jodie Burchell

about 2 years ago • WeAreDevelopers LIVE

Related Articles

View all articles

DN

Dr. Andy R. Terrel - NVIDIA

What’s the latest in NVIDIA CUDA Python

Python and NVIDIA CUDA have long been friends. Over the last year, NVIDIA teams are working to improve the Pythonista’s experience. This means a top-to-bottom update to the CUDA Platform is fueling the GenAI movement, e.g. llama3, gpt and nemo. These...

What’s the latest in NVIDIA CUDA Python

DC

Daniel Cranney

Dev Digest 157: CUDA in Python, Gemini Code Assist and Back-dooring LLMs

Inside last week’s Dev Digest 157 . 🕹️ Pong in 240 browser tabs 👩‍💻 Gemini Code Assist free for 180k code completions a month 📰 AI is bad at coding and summarising the news 🕵️ Private GitHub repos show up in AI chats 🐍 CUDA for Python developers 🖥️ ...

Dev Digest 157: CUDA in Python, Gemini Code Assist and Back-dooring LLMs

DC

Daniel Cranney

Devs vs. Marketers, COBOL and Copilot, Make Live Coding Easy and more - The Best of LIVE 2025 - Part 3

In this, the third and final part of our series looking back on the best bits from the Weekly Developer Show, we dig into some more classic moments from our guests for you to enjoy. Raphael De Lio reminds us that contributing to open source - and sh...

Devs vs. Marketers, COBOL and Copilot, Make Live Coding Easy and more - The Best of LIVE 2025 - Part 3

CH

Chris Heilmann

With AIs wide open - WeAreDevelopers at All Things Open 2025

Last week our VP of Developer Relations, Chris Heilmann, flew to Raleigh, North Carolina to present at All Things Open . An excellent event he had spoken at a few times in the past and this being the “Lucky 13” edition, he didn’t hesitate to come and...

With AIs wide open - WeAreDevelopers at All Things Open 2025

From learning to earning

Jobs that call for the skills explored in this talk.

Principal Developer Technology Engineer

Nvidia

Remote

API

C++

Linux

DirectX

+1

DevOps Engineer GPU Nvidia CUDA

Avantgarde Experts GmbH
München, Germany

Junior

C++

GIT

CMake

Linux

DevOps

+3

C++ CUDA Engineer

Paris-based
Paris, France

Junior

QT

C++

GIT

Python

OpenGL

+6

NVIDIA GPU Cloud Infrastructure - Platform Engineer

Nvidia

Remote

Senior

Linux

Senior HPC Performance Engineer

Nvidia

Remote

Senior

C++

Python

Docker

Ansible

+4

Senior HPC Performance Engineer

Nvidia
Central Milton Keynes, United Kingdom

£221K

Senior

C++

Python

Docker

Ansible

+4

Solution Architect, Urban AI

Nvidia

Remote

Intermediate

Python

PyTorch

Kubernetes

Computer Vision

+1

Python Developer, NumPy, Pandas, COR7433A

Corriculo Ltd
Oxford, United Kingdom

Remote

£60K

GIT

Linux

NumPy

+8

Senior HPC and AI Networking Performance Research and Analysis Engineer

Nvidia

Remote

Senior

C

Bash

Linux

Python

+2