SRE

Insight Global
Saint Paul, United States of America
4 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Saint Paul, United States of America

Tech stack

API
Artificial Intelligence
Azure
C Sharp (Programming Language)
Customer Data Management
Information Engineering
Distributed Systems
Machine Learning
Nagios
Reliability Engineering
Cloud Platform System
React
Reliability of Systems
Event Driven Architecture
Data Analytics
Machine Learning Operations
REST
Terraform
Databricks

Job description

We are seeking two Site Reliability Engineers (SREs) to join our team supporting a new Azure-based product. This role focuses on system reliability, observability, and monitoring for a data-driven application that provides KPIs and insights to end users daily. The product leverages Azure services, APIs, Databricks, and AI/ML models to process customer data and populate dashboards refreshed once per day.

The SREs will ensure the reliability of the entire pipeline, provide hypercare support, and collaborate with engineering teams to streamline monitoring and alerting processes.

Requirements

Experience in SRE or similar reliability-focused roles.

Strong knowledge of Azure services and cloud-based architectures.

Hands-on experience with observability, monitoring, and alerting tools (App Insights, Elastic).

Ability to work with REST APIs and understand event-driven architectures (e.g., Service Bus).

Proficiency in C# for troubleshooting and minor coding tasks.

Excellent communication and ownership mindset-able to manage issues end-to-end. Experience with Terraform and infrastructure-as-code.

Familiarity with Databricks, AI/ML pipelines, and data engineering concepts.

Knowledge of React for front-end troubleshooting.

Exposure to event-driven distributed systems.

Ability to streamline monitoring processes across multiple teams.

Apply for this position