Senior Manager, Datacenter Software - Firmware Release
Role details
Job location
Tech stack
Job description
We are the Datacenter Software Tools team at NVIDIA. We deliver Infrastructure and Tools for data center deployment, firmware and software package deployment and server manageability. We are looking for a hard-working and experienced senior manager having experience with Datacenter Software and Firmware release management and infrastructure. In this role, you will be driving the release of software and firmware for the world's best resilient GPU based datacenter servers. This is a highly transparent role at NVIDIA to guarantee high quality infrastructure and tooling for software and firmware release features for NVIDIA's scale up and scale out solutions - spanning frontend, backend, infrastructure, and CI/CD based automation. This role requires you to work closely with multi-functional teams including system architects, firmware developers, compliance and security teams, and product management to deliver exceptional software and firmware release solutions. Join us at the forefront of technological advancement.
What you'll be doing:
- In this technical role, you will be bringing in leadership on how releases should be delivered to end customers of rack-scale computing based on tightly coupled compute and switch trays and build end to end infra and workflows to ensure the highest quality releases for data center firmware and software.
- Define release scope for rack scale products working cross functionally with product management, technical architects and program management. Deliver these releases that flow through the validation matrix for customer end use cases, ensuring delivered firmware and software is of the highest quality. Solutions must scale, be resilient, and support secure upgrades or rollbacks across diverse customer scenarios.
- Influence architecture, design and implementation decisions for compute and switch trays software and firmware - ensuring quality across nightly, dev and production drops for all customer use cases, with the right release-validation strategy at each phase of development life cycle.
- Partner with all matrixed organizations: Developers, SWQA, Product engineering to left-shift release quality from dev to QA in a very fast-moving environment with end-to-end CI/CD to ensure no bug is found at customer site. Enforce it with well-placed quality metrics for any product milestone and track KPIs published at regular cadence that are enforced. Monitor and report progress of releases to all stakeholders.
- Own ingestion and packaging of software and firmware binaries, readying them for deployment across multiple platforms at scale across different CSP environments.
- Document procedures and engage in collaborative discussions to refine software and firmware release workflows, including identifying and resolving issues in release milestone packaging and deployment procedures and remove bottlenecks. Shape the team's roadmap and drive innovation - including self-service interfaces, automation, AI-assisted validation and triage, and sophisticated release-compliance reporting.
- Continuously review and identify improvement opportunities in established release processes, infrastructure, and practices. Ensure the teams are performing in the most efficient and transparent way with a strong focus on automation and measurable targets.
Requirements
- 12+ overall years in the software industry with specialization in system software and/or firmware development.
- 5+ years of proven technical hands-on leadership for multi-team organizations across data center firmware like BMC, FPGA, CPLDs, network switches, building Infrastructure for continue improvement for quality of releases.
- BS/MS/PhD in CS, CE, EE, or a related technical field - or equivalent experience
- Prior experience in systems software or firmware development with a proven history of guiding complex software features or products throughout the entire product life cycle. Ideally, on rack-scale datacenter products.
- Strong understanding of computer system architecture, operating systems principles, HW-SW interactions, and performance analysis/optimizations.
- Working fluency in Python and Linux sufficient to review designs, prototype tooling, and debug production issues alongside the team. Hands-on experience with web application frameworks and CI/CD platforms (Jenkins, GitLab, Artifactory).
- Track record of balancing multiple projects with competing priorities and delivering against measurable benchmarks (MTTR, specification compliance, release cadence, automation coverage).
- Excellent communication and collaboration skills across teams and time zones.
Ways to stand out from the crowd:
- Familiarity with the architecture of datacenter server software and experience with the in-band and out-of-band management of firmware and hardware components.
- Understanding REST architecture style especially JSON over HTTPs with OAuth and DMTF / PLDM / SPDM firmware management protocols.
- Proven experience in developing a self-service release infrastructure, resulting in clear reductions in onboarding SLA times.
- Experience integrating AI/LLM tooling into engineering workflows - for triage, test generation, code review, or release validation.
- Experience leading engineering teams with geographically distributed teams across US and APAC.
Benefits & conditions
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 272,000 USD - 431,250 USD.