Sr Storage Engineer
NVIDIA Ltd.
Childress, United States of America
2 days ago
Role details
Contract type
Temporary contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
SeniorJob location
Childress, United States of America
Tech stack
API
BIOS
Linux
Firmware
InfiniBand
Node.js
Dell PowerEdge
Red Hat Enterprise Linux - RHEL
Cloud Platform System
Parallel Computation
HybridCloud
Job description
- Firmware Updates
- Update server, BIOS, NIC, storage, and related firmware.
- Ensure version alignment and post-update validation.
- Redfish
- Overview and usage of Redfish APIs.
- Customization and automation using Redfish for system management and monitoring.
- BlueField
- Configuration and management of BlueField DPUs.
Requirements
Must have skills: PowerEdge Rack/Tower Experience, NVIDIA certifications
Nice to have skills: PowerEdge XE server experience NVIDIA QR Switches, * Deep hands-on experience with GPU deployment, configuration, and multi-node testing using NVIDIA Base Command Manager
- Proficiency with benchmarking tools: HPL, STREAM, NCCL, RCCL, MxP, OSU Microbenchmarks
- Red Hat certification (RHCSA/RHCE) or 7+ years of relevant RH distros experience
- Experience with GenAI/HPC networking (InfiniBand and/or RoCE)
- Experience working in Linux based parallel computing environments at scale
- Strong customer facing and communication skills
Desirable Requirements:
- Bachelor's degree
- NVIDIA certifications (NCA, NCE, DGX)
- Experience with NVIDIA UFM, Infiniband, and SpectrumX fabrics
- Exposure to hybrid cloud or GPU cloud environments
- Experience with GPU observability/performance profiling tools
- Code Upgrade
- Perform cluster-level code upgrades as per approved versions and compatibility guidelines.
- iDRAC Management
- Configuration, access validation, and health checks of iDRAC.