(Senior) Site Reliability Engineer
Role details
Job location
Tech stack
Job description
- Partner closely with product and platform teams to design, build, and operate reliable, scalable, and secure distributed systems
- Drive the adoption of reliability principles such as SLOs, error budgets, and measurable service ownership
- Design and implement automation that reduces manual effort, eliminates repetitive tasks, and improves system resilience
- Contribute to system architecture decisions with a focus on failure modes, scalability limits, and performance characteristics
- Analyze incidents end-to-end, from symptoms to root causes, and drive systemic improvements rather than one-off fixes
- Participate in on-call rotations and lead incident response across multiple layers of the stack
- Continuously improve observability by defining meaningful signals, building insights, and enabling fast diagnosis
- Influence and mentor teams toward engineering practices that prioritize reliability, simplicity, and operability
- Ensure systems meet security, compliance, and regulatory requirements expected in payment environments
Requirements
Paymenttools (English), To thrive in this role, you should approach challenges with an engineering mindset, automating before operating and focusing on outcomes that drive real impact. You are someone who takes ownership of ambiguous problems and stays curious about the inner workings of complex systems.You know when to dive deep and when to favor simplicity, showing a high level of pragmatism even in new domains. We don't expect you to be an expert in every tool we use; instead, we value your ability to analyze systems, ask the right questions, and build reliable solutions., * Strong software engineering background with experience in building or operating large-scale distributed systems
- Proficiency in at least one programming language (e.g. Go, Java, Kotlin, or similar), with the ability to write production-quality code
- Solid understanding of computer science fundamentals such as data structures, algorithms, and complexity trade-offs
- Deep understanding of how systems work under the hood, including networking concepts (e.g. layering, routing, latency, failure modes), operating systems, and concurrency
- Experience working with cloud-native environments and modern infrastructure patterns (e.g. containerized workloads, declarative infrastructure, service-to-service communication)
- Strong intuition for debugging complex systems, especially under production pressure
- Experience with observability practices and tools, and the ability to define what "good" looks like in terms of system health
- Ability to reason about trade-offs between reliability, performance, and cost
Benefits & conditions
- Deutschland ticket, subsidized subscription
- 1.000 euro annual learning and development budget + internal training platforms
- Discounts on travel, fashion, technology, and more through our corporate benefits
- REWE discount card for discounts for REWE group retailers
- JobRad, affordable bicycle leasing!
- Company pension plan
- Insurance Services
Perks of working with us:
- We work in a hybrid environment
- Flexible working hours that fit your workflow, your time matters!
- Responsibility from day one
- Work with modern and agile software such as Google Workspace, Slack, Asana, Jira, Lattice, Miro and Confluence
- Company events including Hackathons and Company Days
- Ask us more about these!