Site Reliability Engineer (SRE)
Role details
Job location
Tech stack
Job description
As an SRE Engineer, you will play a critical role in ensuring the reliability of our production systems. You'll collaborate closely with Cloud and Tech teams, leveraging automation and best practices to build resilient infrastructure and support the continuous growth of our environments. You will work alongside the Service Operation Center (SOC) team and other SRE Engineers in a collaborative team environment to solve complex software reliability incidents while collaborating with key stakeholders across various departments to ensure system reliability and the success of our infrastructure., System Monitoring and Incident Response
- Proactively manage and respond to incidents raised to the SRE team, ensuring timely resolution to maintain system reliability.
Blameless Postmortem
- Foster a blameless culture for post-mortem processes and ensure root cause analysis (RCA) is documented.
On-Call Responsibilities
- Be available for on-call rotations to respond to critical incidents outside of standard working hours.
Knowledge Sharing and Mentoring
- Take initiative to advocate for best practices in reliability, automation, and observability.
- Create opportunities to share knowledge amongst peers and stakeholders, and mentor more junior team members.
Tooling and Automation Development
- Develop and maintain automation tools to enhance system operations, reduce manual interventions, and improve response times.
Service Level Objectives (SLOs) and Service Level Agreements (SLAs)
- Ensure that all SLOs and SLAs are consistently met or exceeded, with a focus on optimising service availability.
Collaboration and Communication
- Work closely with cross-functional teams, including cloud, tech, and support teams, to implement changes that enhance system reliability and performance. Communicate effectively with both technical and non-technical stakeholders to explain complex issues clearly.
Performance Optimisation
- Continuously monitor system performance, conduct load testing, and optimise services to ensure high availability and scalability.
Security and Compliance
- Implement and uphold security best practices, ensuring compliance with industry standards and regulations relevant to the iGaming sector.
Continuous Improvement
- Engage in writing postmortems and conduct process optimisation to enhance system reliability and operational efficiency.
Requirements
-
At least 5 years of experience working in a Site Reliability, Software Engineer, or similar role, with a strong emphasis on maintaining system reliability and performance in a production environment.
-
Proficiency in at least one programming or scripting language (e.g., Go, Python, Bash) with an emphasis on automation and tooling development.
-
Proficiency in managing and deploying infrastructure on AWS, GCP, or Azure, including hands-on experience with automation tools and cloud services.
-
Hands-on experience with IaC tools such as Terraform, including deploying and managing cloud infrastructure as code.
-
Advanced understanding of CI/CD pipelines, including building, maintaining, and optimising these pipelines for automated testing and deployment.
-
Proficiency in monitoring tools (e.g., Prometheus, Grafana, Datadog) and experience in performance tuning of large-scale systems.
-
Proven track record in diagnosing and resolving complex system issues, including incident response and root cause analysis.
-
Solid knowledge of networking principles, security best practices, and compliance requirements, particularly those relevant to the iGaming industry.
-
Ability to clearly communicate technical concepts to cross-functional teams and non-technical stakeholders, fostering a collaborative work environment.
-
Ability to work effectively under pressure, demonstrate high-level communication skills, and collaborate within a fast-paced, evolving environment.
-
Preferred: Additional experience in software engineering practices and familiarity with automation tools specific to our tech stack will be considered a strong advantage.