SRE
Role details
Job location
Tech stack
Job description
We are seeking two Site Reliability Engineers (SREs) to join our team supporting a new Azure-based product. This role focuses on system reliability, observability, and monitoring for a data-driven application that provides KPIs and insights to end users daily. The product leverages Azure services, APIs, Databricks, and AI/ML models to process customer data and populate dashboards refreshed once per day.
The SREs will ensure the reliability of the entire pipeline, provide hypercare support, and collaborate with engineering teams to streamline monitoring and alerting processes.
Requirements
Experience in SRE or similar reliability-focused roles.
Strong knowledge of Azure services and cloud-based architectures.
Hands-on experience with observability, monitoring, and alerting tools (App Insights, Elastic).
Ability to work with REST APIs and understand event-driven architectures (e.g., Service Bus).
Proficiency in C# for troubleshooting and minor coding tasks.
Excellent communication and ownership mindset-able to manage issues end-to-end. Experience with Terraform and infrastructure-as-code.
Familiarity with Databricks, AI/ML pipelines, and data engineering concepts.
Knowledge of React for front-end troubleshooting.
Exposure to event-driven distributed systems.
Ability to streamline monitoring processes across multiple teams.