GPU Infrastructure Engineer

SolutionIT, Inc.
Boston, United States of America
16 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Boston, United States of America

Tech stack

API
BIOS
Profiling
Linux
Firmware
InfiniBand
Node.js
Dell PowerEdge
Red Hat Enterprise Linux - RHEL
Cloud Platform System
Parallel Computation
HybridCloud
Hardware Infrastructure

Job description

  • Configuration, access validation, and health checks of iDRAC.

  • Troubleshooting and lifecycle management support.

  • Firmware Updates

  • Update server, BIOS, NIC, storage, and related firmware.

  • Ensure version alignment and post-update validation.

  • Redfish

  • Overview and usage of Redfish APIs.

  • Customization and automation using Redfish for system management and monitoring.

  • BlueField

  • Configuration and management of BlueField DPUs.

Requirements

  • Must have skills = PowerEdge Rack/Tower Experience, NVIDIA certifications
  • Nice to have skills - PowerEdge XE server experience NVIDIA QR Switches
  • Deep hands-on experience with GPU deployment, configuration, and multi-node testing using NVIDIA Base Command Manager
  • Proficiency with benchmarking tools: HPL, STREAM, NCCL, RCCL, MxP, OSU Microbenchmarks
  • Red Hat certification (RHCSA/RHCE) or 7+ years of relevant RH distros experience
  • Experience with GenAI/HPC networking (InfiniBand and/or RoCE)
  • Experience working in Linux based parallel computing environments at scale
  • Strong customer facing and communication skills

Desirable Requirements

  • Bachelor's degree
  • NVIDIA certifications (NCA, NCE, DGX)
  • Experience with NVIDIA UFM, Infiniband, and SpectrumX fabrics
  • Exposure to hybrid cloud or GPU cloud environments
  • Experience with GPU observability/performance profiling tools
  • Code Upgrade
  • Perform cluster-level code upgrades as per approved versions and compatibility guidelines.

About the company

Solution IT Inc. is looking for GPU Infrastructure Engineer (PowerEdge) for one of its clients in Childress, TX

Apply for this position