Staff AI/ML Infrastructure Engineer

Kforce Inc.
West Palm Beach, United States of America
13 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

West Palm Beach, United States of America

Tech stack

Artificial Intelligence
Automation of Tests
Intelligent Platform Management Interface
Bash
BIOS
Computer Clusters
Linux
Device Drivers
Network Interface Controllers
Firmware
InfiniBand
Python
Machine Learning
Package Management Systems
PCI Express
Infrastructure Automation Frameworks
Bare Metal
Machine Learning Operations
Hardware Infrastructure

Job description

  • Design and maintain GPU and bare metal infrastructure in containerized and physical environments
  • Build scalable GPU clusters in partnership with networking and provisioning teams
  • Ensure reliable, high-performance provisioning of GPU infrastructure
  • Develop automated testing systems for GPU-based platforms
  • Implement infrastructure solutions for diverse AI/ML workloads
  • Benchmark, test, and troubleshoot GPU performance at scale
  • Collaborate with hardware vendors on drivers, firmware, and support
  • Resolve hardware, software, and performance issues across environments
  • Optimize rail and cluster performance across architectures
  • Lead technical direction and mentor engineers on infrastructure best practices

Requirements

  • 5+ years experience working with bare metal infrastructure and hardware automation
  • Hands-on experience with modern NVIDIA/AMD GPU platforms and high-performance networking (RoCE, InfiniBand)
  • Deep knowledge of BIOS, BMC, firmware, NICs, Redfish/IPMI, and PCIe systems
  • Strong Linux systems experience including device drivers and package management
  • Experience building infrastructure automation using Python and Bash
  • Familiarity with GPU drivers, firmware ecosystems, and vendor collaboration
  • Experience designing and delivering complex infrastructure products
  • Proven ability to lead projects and mentor engineers
  • Experience optimizing multi-cluster GPU environments
  • Exposure to Machine Learning software stacks and GPU workloads

Benefits & conditions

The pay range is the lowest to highest compensation we reasonably in good faith believe we would pay at posting for this role. We may ultimately pay more or less than this range. Employee pay is based on factors like relevant education, qualifications, certifications, experience, skills, seniority, location, performance, union contract and business needs. This range may be modified in the future.

We offer comprehensive benefits including medical/dental/vision insurance, HSA, FSA, 401(k), and life, disability & ADD insurance to eligible employees. Salaried personnel receive paid time off. Hourly employees are not eligible for paid time off unless required by law. Hourly employees on a Service Contract Act project are eligible for paid sick leave.

Note: Pay is not considered compensation until it is earned, vested and determinable. The amount and availability of any compensation remains in Kforce's sole discretion unless and until paid and may be modified in its discretion consistent with the law.

About the company

By clicking "Apply Today" you agree to receive calls, AI-generated calls, text messages or emails from Kforce and its affiliates, and service providers. Note that if you choose to communicate with Kforce via text messaging the frequency may vary, and message and data rates may apply. Carriers are not liable for delayed or undelivered messages. You will always have the right to cease communicating via text by using key words such as STOP.

Apply for this position