What if your Python code could achieve over 90% of a GPU's theoretical max performance? Learn how NVIDIA is making it possible.
#1about 6 minutes
Understanding the CUDA platform stack for Python developers
The CUDA platform is layered from high-level domain libraries to low-level hardware access, with new tools aiming to combine Python's productivity with GPU performance.
#2about 3 minutes
Improving performance by fusing GPU operations
The nvmath-python library enables kernel fusion using epilogues, which combines multiple operations like matrix multiplication and bias addition into a single GPU kernel launch.
#3about 5 minutes
Calling device-side functions directly from Python kernels
Python kernels can now directly call pre-compiled, high-performance device-side functions from libraries like cuBLAS, enabled by a just-in-time linker called nvJitLink.
#4about 2 minutes
Fine-grained parallelism with cooperative groups in Python
The CUB library is exposed to Python, allowing for cooperative operations and reductions at the block or warp level for fine-grained control over GPU parallelism.
#5about 3 minutes
Accelerating language support with numba-cuda and nupack
The numba-cuda module is separated to accelerate feature delivery, while nupack automatically generates Python bindings for C++ templated code.
#6about 4 minutes
A Pythonic object model for host-side GPU control
A new high-level object model allows Python developers to directly manage GPU resources like devices, contexts, streams, and linker objects without boilerplate code.
Related jobs
Jobs that call for the skills explored in this talk.
What’s the latest in NVIDIA CUDA PythonPython and NVIDIA CUDA have long been friends. Over the last year, NVIDIA teams are working to improve the Pythonista’s experience. This means a top-to-bottom update to the CUDA Platform is fueling the GenAI movement, e.g. llama3, gpt and nemo. These...
Daniel Cranney
Dev Digest 157: CUDA in Python, Gemini Code Assist and Back-dooring LLMsInside last week’s Dev Digest 157 .
🕹️ Pong in 240 browser tabs
👩💻 Gemini Code Assist free for 180k code completions a month
📰 AI is bad at coding and summarising the news
🕵️ Private GitHub repos show up in AI chats
🐍 CUDA for Python developers
🖥️ ...
Dev Digest 208: 4 Hours Code a Day, WebMCP Insights, PyTorch for BeginnersInside last week’s Dev Digest 208 .
⏳ You can only code 4 hours per day
📡 How your Bluetooth devices snitch on you
🤖 What to learn about AEO and GEO from OpenAI’s GPT-5 tokeniser
📰 AI agent writing hit pieces on software maintainers
🛡️ OpenClaw thre...
From learning to earning
Jobs that call for the skills explored in this talk.