Senior Software SDET Test Development Engineer

NVIDIA Ltd.
Santa Clara, United States of America
3 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
$ 224K

Job location

Santa Clara, United States of America

Tech stack

Java
JavaScript
Advanced Configuration and Power Interface (ACPI)
Artificial Intelligence
Bash
C++
Ubuntu (Operating System)
CentOS
Communications Protocols
Nvidia CUDA
Continuous Integration
Data Centers
Cursor (Graphical User Interface Elements)
Software Debugging
Software Design Documents
Linux
DevOps
Firmware
Github
General-Purpose Computing on Graphics Processing Units
Hyper-V
Python
Kernel-Based Virtual Machine
Natural Language Processing
OpenCL
PCI Express
Red Hat Enterprise Linux - RHEL
Software Reliability Testing
Ansible
TensorFlow
Graphics Processing Unit (GPU)
Extensible Firmware Interface
SUSE Linux
Gerrit
PyTorch
Large Language Models
Deep Learning
Parallel Computation
Backend
Gitlab
Kubernetes
Bare Metal
Data Management
Slurm
Fedora
Docker
SDET
Jenkins
VMware

Job description

NVIDIA is the world leader in GPU Computing. We are passionate about markets include gaming, automotive, vision, HPC, datacenters and networking in addition to our traditional OEM business. NVIDIA is also well positioned as the 'AI Computing Company', and NVIDIA GPUs are the brains powering Deep Learning software frameworks, analytics, data centers, and driving autonomous vehicles. We have some of the most experienced and dedicated people in the world working for us. If you are dedicated, forward-thinking, and hard-working technical people across countries sounds exciting, this job is for you. NVIDIA is looking for an outstanding individual who thrives in a diverse work environment, has outstanding interpersonal skills and possesses a strong sense of engagement and continuous process improvement. This candidate must have enterprise server integration, strong Linux experience, reliability testing with various telemetries, scale out cluster, test plan development, track record in developing AI tools and NLP, DevOps, CI/CD experience to join our platform SWQA team.

What you'll be doing:

  • Responsible for the development and execution of NVIDIA HGX/DGX/MGX platform test plan on servers, OS, FW and CUDA SW stack from design doc.

  • Installing and testing various systems OS, server firmware and SW stack.

  • Drive support for root cause analysis on reliability and validation test failures to identify root cause(s) and achieve mitigation.

  • Build, develop/debug server and OS level automation front-end and back-end framework and tests

  • Review partner and supplier test results and prescribe additional reliability testing on components, servers, and packaging as needed.

Requirements

  • Bachelor's Degree (or equivalent experience) in a STEM (Science, Technology, Engineering, Math or Physics) field

  • 5+ years proven experience; or master's degree.

  • Proven years of OS and server level automation, CI/CD process and DevOps experience using Python, SHELL, Ansible, Jenkins, C/C++, Java, JavaScript

  • Strong server and Linux(Ubuntu, RedHat, CentOS, SuSE, Fedora and etc…) troubleshooting and debugging experience in a bare-metal and KVM/VMWare/Hyper-V environment.

  • Good knowledge and hands-on experience in model testing, AI tools/frameworks (TensorFlow, Pytorch, Cursor and etc…), NLP and LLM benchmarking

  • Experience in using AI development tools for test plans creation, test cases development and test cases automation

  • Strong experience in FW, BMC/OpenBMC, Network protocol, internal/external enterprise storage devices, PCIe buses and devices, IO sub-devices, CPU and memory, ACPI, UEFI spec, Redfish - huge plus

  • Proven years of experience in GitHub/Gitlab/Gerrit, PXE, SLURM, Stack/Kubernetes/Docker) - huge plus

Ways to stand out from the crowd:

  • AI related tools, LLM and NLP.

  • Experience working with NVIDIA GPU hardware is a strong plus.

  • Good to have solid understanding of virtualization in Linux (KVM, Docker orchestrated with Kubernetes)

  • Background in parallel programming ideally CUDA/OpenCL is a plus

Benefits & conditions

With competitive salaries and a generous benefits package, we are widely considered to be one of the technology world's most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us and, due to unprecedented growth, our exclusive engineering teams are rapidly growing. If you're a creative and autonomous engineer with a real passion for technology, we want to hear from you.

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 140,000 USD - 224,250 USD for Level 3, and 168,000 USD - 270,250 USD for Level 4.

You will also be eligible for equity and benefits (https://www.nvidia.com/en-us/benefits/) .

Apply for this position