Kubernetes / OpenShift AI Platform Engineer
Role details
Job location
Tech stack
Job description
CI/CD DevOps GitOps Jenkins Grafana Terraform OpenShift Pipelines Operations Leadership Automation Kubernetes Agentic AI Scalability Observability Problem Solving AI/ML Inference Machine Learning Containerization Docker (Software) Edge Intelligence Business Valuation Workflow Management Amazon Web Services Prometheus (Software) Cloud-Native Computing Full Stack Development Artificial Intelligence Business Transformation Product Family Engineering Infrastructure as Code (IaC) Python (Programming Language) Machine Learning Model Training Role-Based Access Control (RBAC) Troubleshooting (Problem Solving) PyTorch (Machine Learning Library) Artificial Intelligence Infrastructure, We are seeking a Kubernetes / OpenShift AI Platform Engineer to design, build, and optimize enterprise-scale infrastructure supporting advanced AI/ML workloads. This role sits at the intersection of platform engineering, DevOps, and AI infrastructure, enabling model development, training, and real-time inference in a highly regulated environment.
You will work cross-functionally with AI/ML engineers, data scientists, DevOps, and infrastructure teams to deliver scalable, secure, and high-performance AI platforms., * Design and manage Kubernetes and OpenShift clusters at enterprise scale
- Build and optimize infrastructure for AI/ML model training and inference workloads
- Develop automation for deployment, configuration, patching, and platform operations using Python
- Support GPU-enabled workloads and high-performance compute environments
- Implement and maintain CI/CD pipelines, GitOps workflows, and infrastructure-as-code (Terraform)
- Ensure platform reliability, scalability, and performance optimization
- Implement security best practices including RBAC, network policies, and secrets management
- Enable observability through Prometheus, Grafana, and logging frameworks
- Collaborate with engineering teams to standardize and streamline AI platform environments, Use of Artificial Intelligence (AI): We may use Artificial Intelligence (AI) to support parts of our hiring process, including sourcing, screening, and evaluating candidates. AI helps assess applications and qualifications, but final decisions are made by our hiring team. By applying, you acknowledge and agree that your application may be reviewed using AI tools. Related Jobs Openshift Platform Engineer TEKsystems Chandler, AZRemote JSON YAML Linux CI/CD DevOps GitOps Grafana Dashboard OpenShift Pipelines Operations Leadership Management Automation Resilience Kubernetes TensorFlow Agentic AI Scalability Shell Script Observability Accountability Problem Solving AI/ML Inference Helm (Software) Machine Learning Docker Container Docker (Software) Business Valuation Strategic Thinking Analytical Thinking Linux Administration Prometheus (Software) Full Stack Development Hybrid Cloud Computing Technical Requirements Artificial Intelligence Business Transformation Vulnerability Management Microsoft Copilot Studio Infrastructure Automation Product Family Engineering Verbal Communication Skills Infrastructure as Code (IaC) Python (Programming Language) Machine Learning Model Training Role-Based Access Control (RBAC) Troubleshooting (Problem Solving) Artificial Intelligence Infrastructure +0 Kubernetes / OpenShift AI Platform Engineer TEKsystems Chandler, AZRemote Linux CI/CD DevOps GitOps Jenkins Grafana Terraform OpenShift Pipelines Operations Leadership Automation Kubernetes Agentic AI Scalability Observability Problem Solving AI/ML Inference Machine Learning Containerization Docker (Software) Edge Intelligence Business Valuation Workflow Management Amazon Web Services Prometheus (Software) Cloud-Native Computing Full Stack Development Artificial Intelligence Business Transformation Product Family Engineering Infrastructure as Code (IaC) Python (Programming Language) Machine Learning Model Training Role-Based Access Control (RBAC) Troubleshooting (Problem Solving) PyTorch (Machine Learning Library) Artificial Intelligence Infrastructure
Requirements
- 5-7+ years of experience with Kubernetes (production environments)
- Strong experience with Red Hat OpenShift in enterprise environments
- 5-7+ years of hands-on experience with Docker and containerization technologies
- Strong proficiency in Python for automation and platform engineering
- Solid experience working in Linux environments (systems, networking, storage)
- Experience with AWS or other cloud platforms
- Hands-on experience with Terraform and CI/CD tools (e.g., Jenkins)
- Experience supporting AI/ML platforms, model deployment pipelines, or similar workloads, * Experience with AI/ML frameworks such as:
- PyTorch, TensorFlow
- Triton Inference Server, vLLM
- Experience with agentic AI systems or intelligent agents
- Familiarity with:
- Kubernetes Operators and Helm
- GitOps practices and platform standardization
- Strong understanding of:
- Observability (Prometheus, Grafana)
- Kubernetes/OpenShift security models (SCCs, RBAC, etc.), * Deep understanding of Kubernetes architecture and cluster lifecycle management
- Proven ability to operate in large-scale, fast-paced enterprise environments
- Strong problem-solving and troubleshooting skills across distributed systems
- Experience building platforms that support other engineering teams
Benefits & conditions
Eligibility requirements apply to some benefits and may depend on your job classification and length of employment. Benefits are subject to change and may be subject to specific elections, plan, or program terms. If eligible, the benefits available for this temporary role may include the following:
- Medical, dental & vision
- Critical Illness, Accident, and Hospital
- 401(k) Retirement Plan - Pre-tax and Roth post-tax contributions available
- Life Insurance (Voluntary Life & AD&D for the employee and dependents)
- Short and long-term disability
- Health Spending Account (HSA)
- Transportation benefits
- Employee Assistance Program
- Time Off/Leave (PTO, Vacation or Sick Leave) Workplace Type