Infrastructure Architect - AI & Datacenter

STAFFING TECHNOLOGIES
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Remote

Tech stack

Cloud Computing
Data Centers
Microprocessors
Network Segmentation
Prism (Software)
Virtual Machines
Graphics Processing Unit (GPU)
Kubernetes
Machine Learning Operations
Hardware Asset Management
Nutanix
Legacy Systems

Job description

  1. AI & GPU Infrastructure Design (GPU Farm / AI Factory)
  • Lead the architectural design and refinement of the Nutanix GPU-as-a-Service (GPUaaS) platform, ensuring a seamless experience for internal R&D, QA, and Sales teams.
  • Provide technical leadership in some of the key initiatives such as Nutanix Validated Designs (NVD) for the AI Factory, incorporating NVIDIA MGX/HGX architectures and high-density Cisco nodes (e.g., UCS 845A).
  • Architect the Management Cluster control plane (NKP, Prism Central, NuDeploy) to ensure it is decoupled from GPU compute nodes for maximum efficiency.
  • Implement policy-driven placement of workloads across on-prem and cloud-burst environments.
  1. Data Center Asset & Lifecycle Management
  • Design solution for a centralized Data Center Asset Inventory system, ensuring real-time visibility into all hardware assets, including CPUs, GPUs, Virtual Machines, and networking.
  • Develop a comprehensive Hardware Lifecycle Management strategy, including procurement forecasting, "rack and stack" operationalization, and decommissioning of legacy systems (G3/G4/G5).
  • Lead "Tiger Team" initiatives to navigate supply chain constraints, ensuring critical release milestones are not delayed by hardware shortages.
  • Enforce strict Security Standards for Data Center HW Provisioning.
  • Implement network segmentation for all the critical applications.
  • Ensure all infrastructure meets SOC 2 and ISO 27001 compliance objectives while maintaining low-latency performance.
  1. Special Projects
  • Provide required architecture and designs during the project intake process. Review, guide the teams for right architecture for all demands before they become approved projects.
  • Partner with security team and provide guidelines for upcoming projects.
  • Involve and lead projects as an architect on special projects.

Requirements

  • Experience managing (as an architect) massive-scale data center environments (1,000+ nodes).
  • Knowledge of Nutanix Cloud Infrastructure (NCI), AHV, and Prism Central
  • Strong background in MLOps and automated pipeline integration (Kubeflow/MLflow).

Apply for this position