Senior SRE / Platform Engineer

SimScale GmbH
Berlin, Germany
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Remote
Berlin, Germany

Tech stack

Java
API
Artificial Intelligence
Amazon Web Services (AWS)
Audit Trail
Cloud Computing
Software Debugging
Programming Tools
Disaster Recovery
Distributed Systems
Python
Linux kernel
Open Source Technology
Prometheus
Software Engineering
Data Logging
Istio
Multi-Cloud
Kubernetes
Terraform
Dynatrace

Job description

We are looking for a Senior SRE / Platform Engineer (m/f/d) to own and improve the cloud infrastructure behind SimScale's browser-based simulation platform. The role spans AWS and EKS, observability, disaster recovery, security and compliance controls, multi-region architecture, elastic GPU/HPC capacity, and internal developer tooling.

SimScale's engineering teams run workloads directly on AWS; you will build the standards, guardrails, and self-service tooling that let them do so safely, raising reliability and security without slowing engineering velocity. You will join a small, tightly knit infrastructure team supporting 50+ engineers across the company. This is a hands-on senior individual contributor role; people management is not required, but there is a genuine path toward tech-lead ownership as the team grows.

Your Opportunity

  • Evolve our Kubernetes platform: Evaluate and adopt technologies such as Kubernetes Gateway API and service mesh patterns, and coordinate platform evolution across 10+ engineering teams.
  • Take observability to the next level: Drive organization-wide adoption of OpenTelemetry for distributed tracing and metrics, and help teams define meaningful SLOs.
  • Shape multi-region architecture and data residency: Support our move from an EU-centered footprint toward a global, multi-cloud architecture that satisfies disaster-recovery and data-residency requirements.
  • Own cloud cost and efficiency at scale: Keep petabyte-scale infrastructure cost-efficient, secure, and well-instrumented.
  • Improve tooling: Build self-service AWS account provisioning, guardrails and AI-assisted automations that help engineering teams manage infrastructure safely and efficiently at scale.

Requirements

  • 5+ years of professional experience in SRE, platform, or infrastructure engineering.
  • Software development experience: Your background is rooted in software development, and you moved into SRE from there. You write production-quality software in at least one of Python, Go, Rust, or Java.
  • Strong systems foundation: You understand Linux internals and distributed systems well enough to debug complex production behavior.
  • Hands-on cloud and infrastructure experience: AWS (or GCP), declarative infrastructure (Terraform), gitops-workflow (ArgoCD) and container orchestration (Kubernetes).
  • Observability and reliability experience: You have worked with OpenTelemetry, Prometheus, distributed tracing, monitoring, and meaningful SLOs/SLIs.
  • Production debugging depth: You can investigate complex failures, communicate clearly during incidents, and turn findings into durable improvements.
  • Security and compliance awareness: You understand how infrastructure decisions affect access control, auditability, disaster recovery, logging, and standards such as SOC 2.
  • Clear communication: You can explain trade-offs to engineering teams and help others adopt better platform practices without unnecessary friction.

Bonus Points

  • An open source portfolio or contributions.
  • Prior technical leadership experience, especially in infrastructure, reliability, or platform engineering.

Benefits & conditions

  • Join a dedicated, supportive team with unlimited growth opportunities and leadership potential
  • Make an impact quickly by sharing ideas and contributing to creative, goal-oriented projects
  • Work in a diverse, inclusive environment with colleagues from over 35 countries
  • Enjoy flexible hours and the freedom to work remotely from anywhere in the world
  • Access comprehensive health coverage, retirement plans, paid time off, and wellness support
  • Enjoy fresh office lunches or gift cards as a remote employee
  • Grow as a professional with online/offline learning, language courses, and tech talks
  • Connect at team events, join support groups, and contribute to our ESG and DE&I initiatives
  • Participate in fun team challenges and competitions for added excitement and team spirit

Apply for this position