Sr Storage Engineer

NVIDIA Ltd.
Childress, United States of America
2 days ago

Role details

Contract type
Temporary contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Childress, United States of America

Tech stack

API
BIOS
Linux
Firmware
InfiniBand
Node.js
Dell PowerEdge
Red Hat Enterprise Linux - RHEL
Cloud Platform System
Parallel Computation
HybridCloud

Job description

  • Firmware Updates
  • Update server, BIOS, NIC, storage, and related firmware.
  • Ensure version alignment and post-update validation.
  • Redfish
  • Overview and usage of Redfish APIs.
  • Customization and automation using Redfish for system management and monitoring.
  • BlueField
  • Configuration and management of BlueField DPUs.

Requirements

Must have skills: PowerEdge Rack/Tower Experience, NVIDIA certifications

Nice to have skills: PowerEdge XE server experience NVIDIA QR Switches, * Deep hands-on experience with GPU deployment, configuration, and multi-node testing using NVIDIA Base Command Manager

  • Proficiency with benchmarking tools: HPL, STREAM, NCCL, RCCL, MxP, OSU Microbenchmarks
  • Red Hat certification (RHCSA/RHCE) or 7+ years of relevant RH distros experience
  • Experience with GenAI/HPC networking (InfiniBand and/or RoCE)
  • Experience working in Linux based parallel computing environments at scale
  • Strong customer facing and communication skills

Desirable Requirements:

  • Bachelor's degree
  • NVIDIA certifications (NCA, NCE, DGX)
  • Experience with NVIDIA UFM, Infiniband, and SpectrumX fabrics
  • Exposure to hybrid cloud or GPU cloud environments
  • Experience with GPU observability/performance profiling tools
  • Code Upgrade
  • Perform cluster-level code upgrades as per approved versions and compatibility guidelines.
  • iDRAC Management
  • Configuration, access validation, and health checks of iDRAC.

Apply for this position