Kubernetes Platform Engineer

Bay Systems
Berkeley, United States of America
5 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
$ 208K

Job location

Berkeley, United States of America

Tech stack

Kubernetes Security
Microsoft Windows
Amazon Web Services (AWS)
Backup Devices
Bash
Border Gateway Protocol
Code Review
Computer Programming
Computer Networks
System Configuration
Disaster Recovery
DNS
Github
IPv6
Python
Key Management
Network Troubleshooting
Microsoft Networking
Network Architecture
Routing
Role-Based Access Control
Cloud Services
Ansible
Service Discovery
Service Pack
Software Deployment
Software Engineering
TCP/IP
Virtual Local Area Networks
Google Cloud Platform
Cloud Platform System
Connectivity Problems
Istio
Gitlab
Kubernetes
Iptables
Bare Metal
Linkerd (Service Mesh)
Hardware Infrastructure
CIS Benchmarks
Firewall Services Module
Terraform
Go

Job description

We are seeking a Kubernetes Platform Engineer to join the Platform Engineering team as a hands-on individual contributor. This role focuses on day-to-day operations and administration of Kubernetes clusters, primarily on-premises (K3s/RKE2) with additional support for cloud environments on Google Cloud Platform (GCP) and Amazon Web Services (AWS). You will manage cluster lifecycle operations, implement and maintain Cilium-based networking, troubleshoot complex platform issues, and enable development teams to successfully deploy and operate their workloads. This position balances infrastructure operations with developer enablement, requiring both deep technical expertise and strong collaboration skills., The Platform Engineering team is a small team within ESnet's Systems and Software department that is dedicated to streamlining the software development lifecycle by establishing standardized processes for building, configuring, and deploying applications. The team supports the engineering, implementation, and maintenance of ESnet's platform systems and services including GitLab, Ansible, and Kubernetes environments, with responsibility for both on-premises and cloud-based services deployed across Google Cloud Platform (GCP) and Amazon Web Services (AWS)., * Manage the full lifecycle of Kubernetes clusters (on-premises K3s/RKE2, GKE, and EKS), including upgrades, security patching, scaling, and capacity planning

  • Troubleshoot cluster-level issues including control plane problems, node failures, and resource constraints
  • Implement and maintain cluster security hardening based on CIS benchmarks and organizational security policies
  • Manage etcd cluster health, backup procedures, and disaster recovery capabilities
  • Monitor cluster performance and optimize resource utilization across multi-tenant workloads
  • Coordinate with datacenter operations team for physical infrastructure changes and maintenance windows

Networking & Cilium CNI

  • Implement, configure, and maintain Cilium CNI across on-premises and cloud Kubernetes environments
  • Design and enforce network policies to achieve secure multi-tenant isolation
  • Troubleshoot complex pod networking issues including DNS resolution, service discovery, and connectivity problems
  • Configure and maintain BGP peering with physical network infrastructure for on-premises integration
  • Work with network engineering team on firewall rules, VLANs, IPv6 networking, and network architecture

Internal Developer Platform & Enablement

  • Contribute to building a next-generation internal developer platform inspired by tools like Backstage, focused on increasing development efficiency and security
  • Work with the security team to define secure image baselines and automate the patching pipeline for container images
  • Assist development teams with deploying, configuring, and troubleshooting Kubernetes workloads
  • Review application deployment manifests and provide guidance on best practices and optimization
  • Develop and maintain platform documentation, runbooks, and self-service guides
  • Engage with development teams to understand platform needs and tailor the cluster experience to meet evolving requirements

Requirements

Do you have experience in Windows?, Do you have a Master's degree?, * Typically requires a minimum of 8 years of related experience with a Bachelor's degree; or 6 years and a Master's degree; or equivalent experience.

  • Demonstrated experience administering Kubernetes on on-premises infrastructure (K3s, RKE2, or similar bare-metal distributions)
  • Experience with cloud-managed Kubernetes (GKE and/or EKS)
  • Strong understanding of Linux networking fundamentals: iptables/nftables, routing tables, DNS, TCP/IP stack, network troubleshooting
  • Experience with GitOps methodologies and tools such as ArgoCD or Flux
  • Proficiency in scripting and automation: Bash, Python, Go
  • Cilium CNI or equivalent production experience
  • Ability to work collaboratively in a team environment and communicate technical concepts clearly
  • Understanding of Kubernetes security best practices including Pod Security Standards, RBAC, and secrets management
  • GCP (Google Cloud Platform) and/or AWS (Amazon Web Services) cloud platform experience, * Go programming experience for operator maintenance and platform tooling development
  • CKA (Certified Kubernetes Administrator) or CKS (Certified Kubernetes Security Specialist) certification
  • Background in BGP routing protocols and network engineering concepts
  • IPv6 networking experience
  • Infrastructure as Code experience with Terraform or Ansible
  • Experience with internal developer platform (IDP) tools such as Backstage or similar
  • Experience with service mesh technologies (Istio, Linkerd)
  • Excellent understanding of code review and familiarity with GitHub and GitLab workflows

Apply for this position