Site Reliability Engineer
Role details
Job location
Tech stack
Job description
Site Reliability Engineers are the guardians of our reliability promise. They deliver a highly reliable, resilient, and cost-efficient platform that consistently meets business and customer expectations for availability and performance., * Increase automation of operational activities to reduce downtime risk, in collaboration with Platform Engineering and Domain Squads.
- Drive systemic improvements across engineering teams based on incident RCAs and telemetry insights.
- Implement non-functional improvements (resilience, performance, reliability) directly in code, with Domain Squads reviewing and approving changes.
- Promote adoption of SRE best practices across development teams (integration patterns, monitoring, alerting, real-time tracing).
- Provide cross-platform observability capabilities above and beyond what the Domain Squads provide. Investigate issues and incidents and propose/implement changes as deemed necessary.
- Continuously review logs, metrics, and alerts to identify and/or implement continuous improvements.
- Design non-functional test and continuously run them to ensure that we build quality up to and including production.
Job Benefits
- Hybrid model: 3 days from the office, 2 days per week working from home/home office and lunch is on us when in the office!
- 26 vacation days per year
- Language classes & professional courses
- Free catering & snacks in the office
- Private health insurance
- An afternoon off on your birthday
If you're passionate about tech, innovation, and want to thrive in an environment that values collaboration and diversity, this role might be the perfect fit for you! Apply today and help us shape the future of the PayTech industry!
Requirements
The ideal candidate should have all the following requirements. However, we believe in self-learning and adaptation, so we can be flexible on certain requirements. What Is a MUST
- Proactive attitude, always on the lookout for improvement opportunities.
- Strong scripting skills (Python, Bash).
- Experience in Cloud.
- Knowledge of Grafana, Application Insights, OpenTelemetry, Prometheus.
- 5 Years of DBA experience in creating and maintaining DDBB in SQL Server (Mongo or PostgreSQL).
- Fluent level of English, able to conduct technical meetings in English.
What Is Nice To Have
- Experience with non-functional and production testing.
- Analytical mindset, being able to connect the dots and establish cause and effect.
- Experience with containers and container orchestration platforms (EKS/AKS).
- Understanding of APIs and asynchronous distributed software architectures.
- Working knowledge of AI-enabled tools like VS Code, Claude Code, etc.
- Demonstrable experience with applying AI to Site Reliability Engineering.
- Knowledge with process automation tools like N8N.
- Working experience with chaos engineering.