Kubernetes / OpenShift AI Platform Engineer

CareerCircle
Plano, United States of America
yesterday

Role details

Contract type
Temporary contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
$ 185K

Job location

Remote
Plano, United States of America

Tech stack

Artificial Intelligence
Amazon Web Services (AWS)
Cloud Engineering
Computer Networks
Continuous Integration
Linux
DevOps
Distributed Systems
JSON
Python
Key Management
Linux System Administration
Machine Learning
Openshift
Performance Tuning
Role-Based Access Control
TensorFlow
Prometheus
Azure
Shell Script
Software Vulnerability Management
YAML
AI Infrastructure
Data Logging
Microsoft Power Automate
PyTorch
System Availability
Grafana
HybridCloud
Infrastructure as Code (IaC)
Containerization
AI Platforms
Kubernetes
Infrastructure Automation Frameworks
Machine Learning Operations
Virtual Agents
Terraform
Docker
Jenkins

Job description

CI/CD DevOps GitOps Jenkins Grafana Terraform OpenShift Pipelines Operations Leadership Automation Kubernetes Agentic AI Scalability Observability Problem Solving AI/ML Inference Machine Learning Containerization Docker (Software) Edge Intelligence Business Valuation Workflow Management Amazon Web Services Prometheus (Software) Cloud-Native Computing Full Stack Development Artificial Intelligence Business Transformation Product Family Engineering Infrastructure as Code (IaC) Python (Programming Language) Machine Learning Model Training Role-Based Access Control (RBAC) Troubleshooting (Problem Solving) PyTorch (Machine Learning Library) Artificial Intelligence Infrastructure, We are seeking a Kubernetes / OpenShift AI Platform Engineer to design, build, and optimize enterprise-scale infrastructure supporting advanced AI/ML workloads. This role sits at the intersection of platform engineering, DevOps, and AI infrastructure, enabling model development, training, and real-time inference in a highly regulated environment.

You will work cross-functionally with AI/ML engineers, data scientists, DevOps, and infrastructure teams to deliver scalable, secure, and high-performance AI platforms., * Design and manage Kubernetes and OpenShift clusters at enterprise scale

  • Build and optimize infrastructure for AI/ML model training and inference workloads
  • Develop automation for deployment, configuration, patching, and platform operations using Python
  • Support GPU-enabled workloads and high-performance compute environments
  • Implement and maintain CI/CD pipelines, GitOps workflows, and infrastructure-as-code (Terraform)
  • Ensure platform reliability, scalability, and performance optimization
  • Implement security best practices including RBAC, network policies, and secrets management
  • Enable observability through Prometheus, Grafana, and logging frameworks
  • Collaborate with engineering teams to standardize and streamline AI platform environments, Use of Artificial Intelligence (AI): We may use Artificial Intelligence (AI) to support parts of our hiring process, including sourcing, screening, and evaluating candidates. AI helps assess applications and qualifications, but final decisions are made by our hiring team. By applying, you acknowledge and agree that your application may be reviewed using AI tools. Related Jobs Openshift Platform Engineer TEKsystems Chandler, AZRemote JSON YAML Linux CI/CD DevOps GitOps Grafana Dashboard OpenShift Pipelines Operations Leadership Management Automation Resilience Kubernetes TensorFlow Agentic AI Scalability Shell Script Observability Accountability Problem Solving AI/ML Inference Helm (Software) Machine Learning Docker Container Docker (Software) Business Valuation Strategic Thinking Analytical Thinking Linux Administration Prometheus (Software) Full Stack Development Hybrid Cloud Computing Technical Requirements Artificial Intelligence Business Transformation Vulnerability Management Microsoft Copilot Studio Infrastructure Automation Product Family Engineering Verbal Communication Skills Infrastructure as Code (IaC) Python (Programming Language) Machine Learning Model Training Role-Based Access Control (RBAC) Troubleshooting (Problem Solving) Artificial Intelligence Infrastructure +0 Kubernetes / OpenShift AI Platform Engineer TEKsystems Chandler, AZRemote Linux CI/CD DevOps GitOps Jenkins Grafana Terraform OpenShift Pipelines Operations Leadership Automation Kubernetes Agentic AI Scalability Observability Problem Solving AI/ML Inference Machine Learning Containerization Docker (Software) Edge Intelligence Business Valuation Workflow Management Amazon Web Services Prometheus (Software) Cloud-Native Computing Full Stack Development Artificial Intelligence Business Transformation Product Family Engineering Infrastructure as Code (IaC) Python (Programming Language) Machine Learning Model Training Role-Based Access Control (RBAC) Troubleshooting (Problem Solving) PyTorch (Machine Learning Library) Artificial Intelligence Infrastructure

Requirements

  • 5-7+ years of experience with Kubernetes (production environments)
  • Strong experience with Red Hat OpenShift in enterprise environments
  • 5-7+ years of hands-on experience with Docker and containerization technologies
  • Strong proficiency in Python for automation and platform engineering
  • Solid experience working in Linux environments (systems, networking, storage)
  • Experience with AWS or other cloud platforms
  • Hands-on experience with Terraform and CI/CD tools (e.g., Jenkins)
  • Experience supporting AI/ML platforms, model deployment pipelines, or similar workloads, * Experience with AI/ML frameworks such as:
  • PyTorch, TensorFlow
  • Triton Inference Server, vLLM
  • Experience with agentic AI systems or intelligent agents
  • Familiarity with:
  • Kubernetes Operators and Helm
  • GitOps practices and platform standardization
  • Strong understanding of:
  • Observability (Prometheus, Grafana)
  • Kubernetes/OpenShift security models (SCCs, RBAC, etc.), * Deep understanding of Kubernetes architecture and cluster lifecycle management
  • Proven ability to operate in large-scale, fast-paced enterprise environments
  • Strong problem-solving and troubleshooting skills across distributed systems
  • Experience building platforms that support other engineering teams

Benefits & conditions

Eligibility requirements apply to some benefits and may depend on your job classification and length of employment. Benefits are subject to change and may be subject to specific elections, plan, or program terms. If eligible, the benefits available for this temporary role may include the following:

  • Medical, dental & vision
  • Critical Illness, Accident, and Hospital
  • 401(k) Retirement Plan - Pre-tax and Roth post-tax contributions available
  • Life Insurance (Voluntary Life & AD&D for the employee and dependents)
  • Short and long-term disability
  • Health Spending Account (HSA)
  • Transportation benefits
  • Employee Assistance Program
  • Time Off/Leave (PTO, Vacation or Sick Leave) Workplace Type

About the company

We're partners in transformation. We help clients activate ideas and solutions to take advantage of a new world of opportunity. We are a team of 80,000 strong, working with over 6,000 clients, including 80% of the Fortune 500, across North America, Europe and Asia. As an industry leader in Full-Stack Technology Services, Talent Services, and real-world application, we work with progressive leaders to drive change. That's the power of true partnership. TEKsystems is an Allegis Group company., We're a leading provider of business and technology services. We accelerate business transformation for our customers. Our expertise in strategy, design, execution and operations unlocks business value through a range of solutions. We're a team of 80,000 strong, working with over 6,000 customers, including 80% of the Fortune 500 across North America, Europe and Asia, who partner with us for our scale, full-stack capabilities and speed. We're strategic thinkers, hands-on collaborators, helping customers capitalize on change and master the momentum of technology. We're building tomorrow by delivering business outcomes and making positive impacts in our global communities. TEKsystems and TEKsystems Global Services are Allegis Group companies. Learn more at TEKsystems.com., TEKsystems Chandler, AZ*Remote DevOps Templates Terraform Pipelines Operations Business Valuation Security Governance Full Stack Development Artificial Intelligence Business Transformation Self Service Technologies Product Family Engineering Google Cloud Platform (GCP) +0

Apply for this position