Kubernetes MLOps Engineer
OpenKyber LLC
Jackson Township, United States of America
6 days ago
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
SeniorJob location
Jackson Township, United States of America
Tech stack
Kubernetes Security
API
User Authentication
Azure
Cloud Computing
Computer Networks
Continuous Integration
Linux
Disaster Recovery
Github
Key Management
Lightweight Directory Access Protocols (LDAP)
Network Control
Octopus Deploy
Role-Based Access Control
Azure
Ceph
Software Troubleshooting
Gitlab-ci
Kubernetes
Deployment Automation
Rancher
Bare Metal
Machine Learning Operations
Jenkins
VMware
Job description
- Design and implement Rancher-managed Kubernetes clusters (RKE, RKE2, K3s, EKS, AKS, GKE).
- Architect high availability (HA) Rancher setups.
- Define multi-cluster and multi-tenant strategies using Rancher projects, namespaces, and RBAC.
- Integrate Kubernetes with VMware, Bare Metal, and Cloud platforms.
- Establish standardized cluster blueprints and reference architectures.
- Act as final escalation (L3) for Kubernetes and Rancher incidents.
- Diagnose and resolve Control plane failures etcd performance and corruption issues Pod scheduling and node pressure issues CNI (Calico / Cilium) networking problems CSI storage failures (Ceph, Longhorn, EBS, Azure Disk, NFS).
- Perform root cause analysis (RCA) and provide preventive recommendations.
- Install, upgrade, and maintain Rancher Server.
- Manage cluster lifecycles using Rancher UI & APIs.
- Implement and manage Rancher RBAC, Authentication (AD / LDAP / Azure AD / SSO), Global & cluster-level policies.
- Maintain Rancher backups, DR, and recovery procedures.
- Enforce Kubernetes security best practices like Pod Security Standards (PSS), Network policies and Secrets management.
- Integrate Kubernetes with CI/CD tools e.g., GitHub Actions, GitLab CI, Jenkins, Argo CD.
- Enable GitOps workflows for application and cluster configuration.
- Support Helm chart development and lifecycle management.
- Assist development teams with Deployment strategies, Resource optimization Troubleshooting application issues on Kubernetes.
Requirements
Experience: 6 10+ years in Linux / Infrastructure / Cloud 3 5+ years hands-on Kubernetes production experience Strong expertise in Rancher (RKE / RKE2 / K3s) Deep understanding of: Kubernetes control plane etcd Networking (CNI) Storage (CSI)