Site Reliability Engineer
Role details
Job location
Tech stack
Job description
We're PayRetailers, and we offer cutting-edge payment solutions that empower businesses to succeed in Latin America & Africa. Our collaborative and inclusive work environment encourages creativity and growth, where every employee's contribution is valued., We've got big plans to expand into new markets and make a meaningful impact on the world of payments. To help us get there, our Technology team is on the lookout for a new Database Administrator. Site Reliability Engineers are the guardians of our reliability promise. They deliver a highly reliable, resilient, and cost-efficient platform that consistently meets business and customer expectations for availability and performance., * Increase automation of operational activities to reduce downtime risk, in collaboration with Platform Engineering and Domain Squads.
- Drive systemic improvements across engineering teams based on incident RCAs and telemetry insights.
- Implement non-functional improvements (resilience, performance, reliability) directly in code, with Domain Squads reviewing and approving changes.
- Promote adoption of SRE best practices across development teams (integration patterns, monitoring, alerting, real-time tracing).
- Provide cross-platform observability capabilities above and beyond what the Domain Squads provide. Investigate issues and incidents and propose/implement changes as deemed necessary.
- Continuously review logs, metrics, and alerts to identify and/or implement continuous improvements.
- Design non-functional tests and continuously run them to ensure that we build quality up-to and including production.
Job Benefits
- Hybrid model: 3 days from the office, 2 days per week working from home/home office and lunch is on us when in the office!
- 26 vacation days per year
- Language classes & professional courses
- Free catering & snacks in the office
- Private health insurance
- An afternoon off on your birthday
Requirements
The ideal candidate should have all the following requirements. However, we believe in self-learning and adaptation, so we can be flexible on certain requirements. What Is a MUST
- Proactive attitude, always on the lookout for improvement opportunities.
- Strong scripting skills (Python, Bash).
- Experience in Cloud.
- Knowledge of Grafana, Application Insights, OpenTelemetry, Prometheus.
- 5 Years of DBA experience in creating and maintaining DDBB in SQL Server (Mongo or PostgreSQL).
- Fluent level of English, able to conduct technical meetings in English.
What Is Nice To Have
- Experience with non-functional and production testing.
- Analytical mindset, being able to connect the dots and establish cause and effect.
- Experience with containers and container orchestration platforms (EKS/AKS).
- Understanding of APIs and asynchronous distributed software architectures.
- Working knowledge of AI-enabled tools like VS Code, Claude Code, etc.
- Demonstrable experience with applying AI to Site Reliability Engineering.
- Knowledge with process automation tools like N8N.
- Working experience with chaos engineering.