Site Reliability Engineer
EVENTIM
Berlin, Germany
4 days ago
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
IntermediateJob location
Berlin, Germany
Tech stack
Amazon Web Services (AWS)
Bash
Cloud Computing
Configuration Management
Continuous Integration
Linux
DNS
Monitoring of Systems
Python
Performance Tuning
Reliability Engineering
Ansible
Zabbix
Load Balancing
Grafana
Gitlab
Kubernetes
Deployment Automation
Terraform
Docker
Jenkins
Job description
As a Site Reliability Engineer (m/f/d), you operate and evolve Linux-based production platforms that power critical business services at scale. You focus on automation, reliability, and reducing operational overhead while enabling teams to work more independently.
Working at the intersection of infrastructure, automation, and reliability, you contribute to building resilient systems and support the evolution toward a "you build it, you run it" culture.
What to expect
- Ensure reliable, secure, and high-performing Linux-based production systems with full ownership
- Automate operational tasks (e.g. patching, provisioning, deployments) to eliminate manual effort and improve efficiency
- Standardize and optimize deployment and configuration processes for scalability and consistency
- Lead incident response and drive root cause analysis and long-term fixes
- Manage and automate access and identity processes with a strong focus on security and auditability
- Maintain and improve core Linux infrastructure services essential for platform operations
- Collaborate with engineering teams to enhance observability and shared operational practices
- Analyze complex systems end-to-end and simplify them to improve reliability and performance
- Drive the modernization of operations towards automation, scalability, and self-service models
- Adapt quickly to changing environments and deliver pragmatic, effective solutions
Requirements
- 5+ years of experience in Linux-based production environments
- Strong expertise in Linux systems engineering, performance tuning, and lifecycle management
- Strong understanding of reliability concepts (SLOs, SLAs, performance, capacity)
- Solid scripting and automation skills (e.g., Bash, Python) with a continuous improvement mindset
- Hands-on experience with configuration management (e.g., Salt, Ansible) and Infrastructure as Code (e.g., Terraform)
- Experience with CI/CD tools (e.g., GitLab, Jenkins) and automated deployments
- Good knowledge of monitoring and observability tools (e.g., Zabbix, Grafana, ELK)
- Proven experience in incident management, root cause analysis, and postmortems
- Experience with security practices, including patching and access control
- Knowledge of core traffic services (DNS, load balancing, CDN)
- Basic experience with container and cloud technologies (Docker, Kubernetes, AWS)