Senior Software Engineer, CUTLASS Kernels
NVIDIA Ltd.
Austin, United States of America
3 days ago
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
Senior Compensation
$ 288KJob location
Austin, United States of America
Tech stack
Nvidia CUDA
Computer Programming
Computer Engineering
Software Debugging
Python
OpenCL
Software Engineering
Graphics Processing Unit (GPU)
Deep Learning
Information Technology
Free and Open-Source Software
Software Coding
Software Performance
Programming Languages
Job description
- Write Tensor Core-based deep learning kernels such as grouped-GEMM, attention, and convolution using CUTLASS CUDA C++ and Python DSL for Blackwell, Rubin, and future architectures.
- Optimize kernels for peak throughput on both silicon and software performance simulators.
- Collaborate with teams across NVIDIA including the GPU architecture, NVVM/PTX compiler, CUDA library, and DL frameworks teams to ensure fast, functional, and timely kernel delivery to customers.
Requirements
Do you have experience in Testing and evaluation?, Do you have a Master's degree?, * Masters or PhD degree in Computer Science, Computer Engineering, or related field (or equivalent experience).
- 3+ years of relevant industry experience.
- Strong proficiency in C++ programming and software design, including debugging, performance evaluation, and testing.
- Experience with CUDA, OpenCL, HIP, SYCL, Mojo, Pallas, Triton, Mosaic, Halide, or any general-purpose or domain-specific programming language targeting highly parallel accelerators.
- Deep understanding of computer architecture and some experience working at the assembly level.
Ways to stand out from the crowd:
- Experience writing code specifically targeting NVIDIA Tensor Cores, particularly through PTX or CUDA/cuTile.
- Open-source contributions to math kernel libraries or frameworks.
Benefits & conditions
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 152,000 USD - 241,500 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4.
About the company
NVIDIA's high-performance computing platforms are powering the AI revolution across many applications and industries. Within our software stack, CUTLASS stands out as a popular open-source ecosystem dedicated to high-performance linear algebra and Tensor Core primitives. Since 2017, it has provided the community with C++ and Python abstractions to implement custom matrix multiply (GEMM) and related math and deep learning computations on NVIDIA GPUs., NVIDIA is widely considered to be one of the technology world's most desirable employers. We have some of the most forward-thinking and hard working people in the world working for us. If you're creative, autonomous, and love a challenge, consider joining our Deep Learning Library team and help us build the real-time, cost-effective computing platform driving our success in this exciting and quickly growing field.