Senior IT Systems Engineer
Role details
Job location
Tech stack
Job description
We are looking for an experienced Senior Systems Engineer who feels comfortable at the intersection of classical on-premises infrastructure and modern cloud-native technologies. You will be responsible for the reliability, automation and observability of our hybrid environments - with a strong focus on VMware Cloud Foundation (VCF), Observability using the Grafana LGTM Stack, Kubernetes cluster management, and automation workflows.
Your main responsibilities
- Design, operate and continuously improve the reliability & availability of our VMware Cloud Foundation (VCF) based platforms (on-prem and in interconnection with cloud environments)
- Implement and extend our observability stack based on Grafana LGTM
- Manage and automate VMware landscapes (vSphere, NSX, vSAN, Aria Suite etc.) in large-scale hybrid/multi-cloud setups
- Build, operate and scale Kubernetes clusters, including day-2 operations, upgrades, capacity management and security hardening
- Develop and maintain automation workflows primarily using Kestra (in conjunction with other tools such as Ansible, Terraform)
- Drive incident response, post-mortem culture, error budgets and toil reduction according to SRE principles
- Collaborate closely with development teams, platform teams and security to enable self-service capabilities and fast, safe releases
- Participate in on-call rotation
Requirements
Do you have experience in vSphere?, * Very good hands-on experience with VMware Cloud Foundation (VCF) - ideally including recent versions (5.x / 9.x)
- Solid understanding of core VMware technologies: vSphere, NSX, vSAN, Aria Operations / Aria Automation
- Practical experience operating and troubleshooting Kubernetes clusters (preferably 1-4 years)
- Good knowledge of container ecosystem concepts
- Strong skills with Infrastructure as Code (Terraform, Ansible)
- Experience with workflow orchestration / automation tools - Kestra is a strong plus
- Very good practical knowledge of Grafana LGTM stack
- Scripting & programming skills
- Understanding of hybrid cloud architectures (on-prem public cloud connectivity patterns)
- Familiar with SRE principles (SLI/SLO, error budgets, toil reduction, blameless post-mortems)
- Good German and very good English skills (documentation & communication mostly in English)