Kubernetes MLOps Engineer

OpenKyber LLC
Jackson Township, United States of America
6 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Jackson Township, United States of America

Tech stack

Kubernetes Security
API
User Authentication
Azure
Cloud Computing
Computer Networks
Continuous Integration
Linux
Disaster Recovery
Github
Key Management
Lightweight Directory Access Protocols (LDAP)
Network Control
Octopus Deploy
Role-Based Access Control
Azure
Ceph
Software Troubleshooting
Gitlab-ci
Kubernetes
Deployment Automation
Rancher
Bare Metal
Machine Learning Operations
Jenkins
VMware

Job description

  • Design and implement Rancher-managed Kubernetes clusters (RKE, RKE2, K3s, EKS, AKS, GKE).
  • Architect high availability (HA) Rancher setups.
  • Define multi-cluster and multi-tenant strategies using Rancher projects, namespaces, and RBAC.
  • Integrate Kubernetes with VMware, Bare Metal, and Cloud platforms.
  • Establish standardized cluster blueprints and reference architectures.
  • Act as final escalation (L3) for Kubernetes and Rancher incidents.
  • Diagnose and resolve Control plane failures etcd performance and corruption issues Pod scheduling and node pressure issues CNI (Calico / Cilium) networking problems CSI storage failures (Ceph, Longhorn, EBS, Azure Disk, NFS).
  • Perform root cause analysis (RCA) and provide preventive recommendations.
  • Install, upgrade, and maintain Rancher Server.
  • Manage cluster lifecycles using Rancher UI & APIs.
  • Implement and manage Rancher RBAC, Authentication (AD / LDAP / Azure AD / SSO), Global & cluster-level policies.
  • Maintain Rancher backups, DR, and recovery procedures.
  • Enforce Kubernetes security best practices like Pod Security Standards (PSS), Network policies and Secrets management.
  • Integrate Kubernetes with CI/CD tools e.g., GitHub Actions, GitLab CI, Jenkins, Argo CD.
  • Enable GitOps workflows for application and cluster configuration.
  • Support Helm chart development and lifecycle management.
  • Assist development teams with Deployment strategies, Resource optimization Troubleshooting application issues on Kubernetes.

Requirements

Experience: 6 10+ years in Linux / Infrastructure / Cloud 3 5+ years hands-on Kubernetes production experience Strong expertise in Rancher (RKE / RKE2 / K3s) Deep understanding of: Kubernetes control plane etcd Networking (CNI) Storage (CSI)

Apply for this position