AI Infrastructure Engineer
Role details
Job location
Tech stack
Job description
We are seeking a highly experienced AI Infrastructure Engineer to architect, deploy and optimize enterprise-scale AI infrastructure solutions leveraging the Nutanix ecosystem. The ideal candidate will have deep expertise in Nutanix Cloud Infrastructure (NCI), AOS/AHV, Nutanix Kubernetes Platform (NKP), GPU-accelerated computing and hybrid cloud environments.
This role focuses on building scalable, high-performance infrastructure that supports Large Language Models (LLMs), Generative AI workloads, AI training and AI inference platforms across on-premises and cloud environments.
The selected candidate will serve as a Subject Matter Expert (SME) for Nutanix AI infrastructure and will work closely with architecture, cloud, platform, networking and security teams., + Nutanix Cloud Infrastructure (NCI)
- Nutanix AOS (Acropolis Operating System)
- Nutanix AHV (Acropolis Hypervisor)
- Nutanix Cloud Manager (NCM)
- Nutanix Flow
- Nutanix Objects and Files, AI Infrastructure Architecture
- Design and implement scalable AI infrastructure platforms using Nutanix technologies.
- Build optimized environments supporting Generative AI, LLM training and inference workloads.
- Design high-performance compute, storage and networking architectures for AI applications., * Design high-performance storage solutions using Nutanix Objects and Nutanix Files.
- Optimize storage architectures for AI/ML datasets and model repositories.
- Ensure data availability, scalability and performance.
Security & Networking
- Implement Zero-Trust security principles.
- Utilize Nutanix Flow for micro-segmentation and workload security.
- Collaborate with security and networking teams to protect sensitive AI data.
Requirements
- Nutanix Kubernetes Platform (NKP)
- Kubernetes cluster deployment and administration
- Container orchestration and workload management
- AI/ML workload deployment in Kubernetes environments
GPU & AI Infrastructure
- Experience designing and managing GPU-enabled environments
- Hands-on experience with:
- NVIDIA GPU ecosystem (A100, H100, CUDA, GPU Passthrough, vGPU)
- AMD GPU ecosystem
- Experience supporting AI model training and inference workloads
Infrastructure Automation
- Terraform
- Infrastructure as Code (IaC)
- Nutanix Calm
- Automated provisioning and lifecycle management
Monitoring & Observability
- Prometheus
- Grafana
- ELK Stack
- OpenTelemetry
- Monitoring, logging, alerting and performance tuning, * Experience supporting AI/ML platforms, LLMs and Generative AI initiatives.
- Experience with AI model serving frameworks and inference platforms.
- Knowledge of MLOps, AI platform engineering and data pipelines.
- Experience working in enterprise-scale hybrid cloud environments.
- Nutanix certifications highly preferred., * 10+ years of Infrastructure, Cloud, Platform Engineering, or Architecture experience.
- Strong enterprise-level Nutanix experience.
- Experience supporting Kubernetes-based production environments.
- Experience with GPU-enabled infrastructure and AI workloads.