Senior Kubernetes Platform Engineer
Role details
Job location
Tech stack
Job description
- Support day-to-day operations of an enterprise Kubernetes platform (100+ clusters, ~50% production)
- Perform routine operational tasks including cluster maintenance, upgrades, patching, health checks, and capacity management
- Troubleshoot and resolve Kubernetes platform issues impacting cluster or application availability
- Participate in incident response, root-cause analysis, and post-incident reviews
On-Call & Backup Coverage
- Act as a backup platform engineer to enable on-call rotation and reduce key-person dependency
- Provide after-hours support as part of a shared on-call rotation
- Serve as a secondary escalation point for critical Production issues
Application & Platform Support
- Assist internal application teams with Kubernetes-related questions and issues
- Support common Kubernetes constructs such as Pods, Deployments, Services, Ingress, ConfigMaps, and Secrets
- Help teams troubleshoot networking, DNS, ingress, certificate, and resource-related issues
- Review application configurations for Kubernetes best practices and platform alignment
Enterprise Tooling & Integrations
- Work with integrated enterprise tools such as: o Ingress controllers (e.g., Contour / Envoy) o Logging platforms (e.g., Fluent Bit, centralized log aggregation) o Monitoring/observability tools (e.g., Dynatrace or similar) o Container registries (e.g., Harbor, JFrog, etc)
Documentation & Knowledge Sharing
- Help document operational procedures, runbooks, and troubleshooting guides
- Share Kubernetes knowledge and best practices with internal teams
- Assist in improving platform resiliency, operational maturity, and supportability
Requirements
We are seeking a highly skilled Senior Kubernetes Platform Engineer to support and operate an enterprise VMware Tanzu Kubernetes platform consisting of 100+ Kubernetes clusters across multiple environments, including Production. This role will partner closely with the current platform owner to provide operational support, on-call backup, and platform sustainment, helping ensure platform stability, availability, and compliance for critical business applications.
This role is hands-on and production-facing, with responsibilities spanning Kubernetes operations, incident response, platform maintenance, and support of application teams running mission-critical workloads. Experience Level: 3 - Senior
Qualifications (must haves): Kubernetes Experience
-
Strong hands-on experience supporting Kubernetes in Production
-
Solid understanding of Kubernetes fundamentals, including: o Pods, Deployments, StatefulSets o Services and Ingress o Namespaces, RBAC, ConfigMaps, and Secrets
-
Experience troubleshooting Kubernetes networking, scheduling, and resource issues
Platform & Infrastructure
- Experience operating Kubernetes on enterprise platforms (VMware Tanzu, OpenShift, AKS, EKS, or similar)
- Strong Linux administration and troubleshooting skills
- Familiarity with container runtimes and container images
Operational Mindset
- Experience working in highly regulated, production environments
- Comfortable participating in on-call rotations and handling time-sensitive issues
- Strong problem-solving skills with the ability to work independently
Nice to Have:
- Experience with VMware Tanzu Kubernetes or VMware-based Kubernetes platforms
- Scripting experience (Bash, Python) for operational automation
- Prior experience supporting large multi-cluster Kubernetes environments