Senior Datacenter Performance Model Engineer

NVIDIA Ltd.

Santa Clara, United States of America

5 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Compensation

$ 46K

Job location

Santa Clara, United States of America

Tech stack

Adobe InDesign

Artificial Intelligence

C++

Nvidia CUDA

Computer Programming

Microarchitecture

Data Centers

Software Debugging

Device Drivers

Distributed Computing Environment

Job Scheduling

Python

Software Tools

TensorFlow

Software Engineering

Graphics Processing Unit (GPU)

PyTorch

Deep Learning

Kubernetes

Information Technology

Slurm

Requirements

BS+ in Computer Science or related (or equivalent experience) and 5+ years of software development
Strong software skills in design, coding (C++ and Python), analytical, and debugging
Good understanding of Deep Learning frameworks like PyTorch and TensorFlow, distributed training and inference.
Knowledge of GPU cluster job scheduling (Slurm or Kubernetes), storage and networking
Experience with NVIDIA GPUs, CUDA Programming, and Networking
Motivated self-starter with strong problem-solving skills and customer-facing communication skills
Passion for continuous learning. Ability to work concurrently with multiple global groups

Ways to stand out from the crowd:

Proven SW engineering experience experience in deploying SW at Dataceter scale
Solid experience in large AI job performance analysis for training/inference workload
Knowledge of Linux device drivers and/or compiler implementation
Knowledge of GPU and/or CPU architecture and general computer architecture principles

Benefits & conditions

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 152,000 USD - 241,500 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4.

About the company

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It's a unique legacy of innovation that's fueled by great technology-and amazing people. Today, we're tapping into the unlimited potential of AI to define the next era of computing. We are looking for forward-thinking, hard-working, and creative people to join a fast-moving multifaceted software team! This software engineering role involves developing datacenter scale performance modeling and predictions tools for AI researchers running AI workloads in GPU clusters. What you'll be doing: * Build performance modeling and prediction tools for AI workloads at Data-center scale * Develop production tools and workflows used by multiple teams both within NVIDIA and its customers. * Automate workflows including search for the most efficient configurations over millions of parameters * Partner with HW and SW architects to propose new features or improve existing features with real world use cases

Role details

Job location

Tech stack

Requirements

Benefits & conditions

About the company

Apply for this position

Good distractions

Moments

Videos View all