AI Infrastructure Engineer

Nutanix
San Jose, United States of America
2 days ago

Role details

Contract type
Temporary contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

San Jose, United States of America

Tech stack

Artificial Intelligence
Cloud Computing
Nvidia CUDA
Monitoring of Systems
Hypervisor
Network Security
Machine Learning
Network Architecture
Network Segmentation
Performance Tuning
Prometheus
Zero Trust Network Access
Azure
AI Infrastructure
Data Logging
Cloud Platform System
System Availability
Large Language Models
Grafana
Generative AI
HybridCloud
Infrastructure as Code (IaC)
AI Platforms
Kubernetes
Infrastructure Automation Frameworks
Storage Technologies
Machine Learning Operations
Nutanix
Terraform
Data Pipelines
ELK

Job description

We are seeking a highly experienced AI Infrastructure Engineer to architect, deploy and optimize enterprise-scale AI infrastructure solutions leveraging the Nutanix ecosystem. The ideal candidate will have deep expertise in Nutanix Cloud Infrastructure (NCI), AOS/AHV, Nutanix Kubernetes Platform (NKP), GPU-accelerated computing and hybrid cloud environments.

This role focuses on building scalable, high-performance infrastructure that supports Large Language Models (LLMs), Generative AI workloads, AI training and AI inference platforms across on-premises and cloud environments.

The selected candidate will serve as a Subject Matter Expert (SME) for Nutanix AI infrastructure and will work closely with architecture, cloud, platform, networking and security teams., + Nutanix Cloud Infrastructure (NCI)

  • Nutanix AOS (Acropolis Operating System)
  • Nutanix AHV (Acropolis Hypervisor)
  • Nutanix Cloud Manager (NCM)
  • Nutanix Flow
  • Nutanix Objects and Files, AI Infrastructure Architecture
  • Design and implement scalable AI infrastructure platforms using Nutanix technologies.
  • Build optimized environments supporting Generative AI, LLM training and inference workloads.
  • Design high-performance compute, storage and networking architectures for AI applications., * Design high-performance storage solutions using Nutanix Objects and Nutanix Files.
  • Optimize storage architectures for AI/ML datasets and model repositories.
  • Ensure data availability, scalability and performance.

Security & Networking

  • Implement Zero-Trust security principles.
  • Utilize Nutanix Flow for micro-segmentation and workload security.
  • Collaborate with security and networking teams to protect sensitive AI data.

Requirements

  • Nutanix Kubernetes Platform (NKP)
  • Kubernetes cluster deployment and administration
  • Container orchestration and workload management
  • AI/ML workload deployment in Kubernetes environments

GPU & AI Infrastructure

  • Experience designing and managing GPU-enabled environments
  • Hands-on experience with:
  • NVIDIA GPU ecosystem (A100, H100, CUDA, GPU Passthrough, vGPU)
  • AMD GPU ecosystem
  • Experience supporting AI model training and inference workloads

Infrastructure Automation

  • Terraform
  • Infrastructure as Code (IaC)
  • Nutanix Calm
  • Automated provisioning and lifecycle management

Monitoring & Observability

  • Prometheus
  • Grafana
  • ELK Stack
  • OpenTelemetry
  • Monitoring, logging, alerting and performance tuning, * Experience supporting AI/ML platforms, LLMs and Generative AI initiatives.
  • Experience with AI model serving frameworks and inference platforms.
  • Knowledge of MLOps, AI platform engineering and data pipelines.
  • Experience working in enterprise-scale hybrid cloud environments.
  • Nutanix certifications highly preferred., * 10+ years of Infrastructure, Cloud, Platform Engineering, or Architecture experience.
  • Strong enterprise-level Nutanix experience.
  • Experience supporting Kubernetes-based production environments.
  • Experience with GPU-enabled infrastructure and AI workloads.

Apply for this position