Senior Software Engineer, GoLang - DSX MaxQ

NVIDIA Ltd.
Santa Clara, United States of America
20 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
$ 341K

Job location

Santa Clara, United States of America

Tech stack

API
Artificial Intelligence
Software Bug Management
C++
Cloud Engineering
Code Coverage
Nvidia CUDA
Software Debugging
Linux
Distributed Systems
Github
Design of User Interfaces
Monitoring of Systems
Job Scheduling
Python
Linux System Administration
Open Source Technology
Prometheus
Software Engineering
Project Management
Systems Integration
Data Logging
Graphics Processing Unit (GPU)
Enterprise Software Applications
React
Grafana
Kubernetes Helm Charts
Gpu Programming
Gitlab-ci
Kubernetes
Information Technology
Bare Metal
Slurm
Oracle Cloud Infrastructure
Docker
Jenkins
Go

Job description

NVIDIA is looking for outstanding software engineers to help us expand our enterprise GPU management and monitoring tools. In this role, you will work closely with the broader NVIDIA team to design and build cloud-native management agents, Kubernetes integrations, and end-to-end integration solutions that combine GPUs with the rest of the datacenter software management ecosystem. We are focused on supporting NVIDIA products across HPC, cloud, and enterprise on both bare metal and virtualized platforms as the role of GPUs in all of these environments expands. Your contributions will span many aspects of GPU system integration, including telemetry and metrics, health checks, diagnostics, configuration, and system management. These tools fill roles of both passive background monitoring and active online management with a core emphasis on operational transparency and seamless integration in customer environments. Your code will support single-node developer systems through large clusters with thousands of nodes.

To succeed, you must have a strong Linux background, familiarity with modern cloud-native systems, and a proven work ethic. You will be expected to jump in quickly and provide valuable contributions from day one. This is a dynamic work environment with many exciting opportunities awaiting. NVIDIA GPUs are central to many hot enterprise, cloud, and datacenter trends. Come join us as we craft the future of accelerated computing and AI.

What you'll be doing:

  • Develop and maintain distributed, robust and scalable Go programs deployed to Kubernetes environments that manage large datacenters
  • Develop and maintain user-space applications, containers, Go-bindings, and CLI tools.
  • Enable GPU management integration with the state-of-the-art open-source ecosystem, including Kubernetes and Docker.
  • Support internal and external users through bug fixes, documentation, and feature improvements.
  • Maintain high-quality products through robust test coverage.

Requirements

  • BS or higher in Computer Science or equivalent experience. 5+ years of meaningful industry experience with a strong Go and Kubernetes development background
  • User space development and debugging expertise in Linux environments
  • Experience with APIs and interface design
  • Outstanding written and verbal interpersonal skills. Business level English
  • Strong motivation and commitment to learn new skills
  • Ability to execute all aspects of the software development lifecycle. Ability to manage time in a fast, heavily multitasked environment
  • Development experience with Rust, Python and/or C, C++. Development experience with distributed systems and concurrent applications, especially in a Kubernetes environment
  • Experience developing and maintaining enterprise software. Experience deploying, managing, and debugging applications in a Kubernetes environment

Ways to stand out from the crowd:

  • Background with containers (e.g. Docker, OCI), orchestration frameworks, and logging/telemetry backends with Kubernetes monitoring stacks with tools such as Prometheus, Loki and Grafana
  • Experience with modern UI development in React and Node.js or similar frameworks. Experience developing Kubernetes operators or Helm charts
  • Experience with HPC job schedulers like Slurm or Run.AI Familiarity with Kubernetes internals. Exposure to GPU programming with CUDA. Experience with Jenkins and GitHub/GitLab CI/CD pipelines

Benefits & conditions

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 152,000 USD - 241,500 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4.

Apply for this position