Site Reliability Engineer
Stott and May Professional Search Ltd
9 days ago
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English, German Experience level
Senior Compensation
€ 120KJob location
Remote
Tech stack
API
Amazon Web Services (AWS)
Software Debugging
Linux
Distributed Systems
Reliability Engineering
Prometheus
Software Engineering
Delivery Pipeline
Kubernetes
Live Streaming
Terraform
Microservices
Job description
We are looking for an experienced Site Reliability Engineer to help us enhance and scale our core service infrastructure. As part of the SRE team, you will work on building resilient systems, improving performance and observability, and ensuring smooth operation of our highly available platform. You will play a key role in designing reliable services, automating operational processes, and maintaining critical components of a large-scale streaming environment., * Design, improve, and maintain systems to enhance stability, scalability, availability, and latency
- Work collaboratively to troubleshoot and solve issues in highly available production environments
- Own the architecture and reliability of our central Kubernetes platform
- Monitor system health and participate in on-call rotation to manage incidents
- Enable product teams to build microservices using CNCF tools (e.g., Kubernetes, Prometheus, OpenTelemetry)
- Develop automation and tooling to prevent incidents and streamline operational workflows
Requirements
- Strong experience with containers and managing Kubernetes clusters
- Hands-on experience with Terraform and infrastructure automation
- Ability to design and implement APIs (REST or gRPC)
- Proficient in a backend programming language (ideally Go)
- Experience with cloud providers such as AWS or GCP
- Proven track record operating large-scale, distributed systems
- Solid understanding of Linux, networking fundamentals, and system-level debugging
- Fluency in German or English
- Willing to travel occasionally for in-person meetings
Benefits & conditions
- Remote-first setup and flexible working hours
- Regular company events and team gatherings
- Personal learning and development budget
- Comprehensive benefits package
- 30 days of vacation per year
About the company
We are a technology company providing large-scale digital media streaming services across multiple platforms and devices. We operate the full content delivery pipeline end-to-end, including live streaming, on-demand video, recording services, and multi-device playback.
Our team runs a highly available, low-latency streaming platform serving millions of users across the country. We design, operate, and continuously optimize the systems responsible for transporting and delivering media efficiently and reliably.