SRE

Insight Global

Naperville, United States of America

2 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Naperville, United States of America

Tech stack

.NET

API

Artificial Intelligence

Amazon Web Services (AWS)

Application Performance Management

Application Services

C Sharp (Programming Language)

Software Debugging

SQL Azure

Reliability Engineering

Datadog

Data Logging

Azure

Multi-Cloud

Backend

Kubernetes

Cosmos DB

Azure

ELK

Web Api

Microservices

Job description

We are seeking an experienced Site Reliability Engineer (SRE) to support a large-scale enterprise platform that powers microservices across multiple global business divisions. The platform, built on Azure PaaS components with a .NET/C# backend, serves a growing number of products and consumers.

This SRE will serve as the first point of contact for production issues during US business hours, responsible for investigating incidents, diagnosing whether issues are infrastructure- or code-related, configuring and maintaining monitoring/alerting, and ensuring platform resiliency as adoption scales. The role requires hands-on knowledge of Azure PaaS services, application-level debugging, and the ability to engage development teams when deeper code-level resolution is needed.

The SRE will join a growing reliability team and work closely with platform engineering, development, and infrastructure teams. The platform is not fully containerized and relies heavily on Azure PaaS components, though Kubernetes workloads are part of the broader ecosystem. This individual will report to the SRE team lead and play a critical role in ensuring US-timezone coverage and support continuity.

We are a company committed to creating inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity employer that believes everyone matters. Qualified candidates will receive consideration for employment opportunities without regard to race, religion, sex, age, marital status, national origin, sexual orientation, citizenship status, disability, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to Human Resources Request Form. The EEOC "Know Your Rights" Poster is available here.

We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to HR@insightglobal.com.To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/.

Requirements

5-7 years of professional experience in a reliability, platform operations, or SRE-focused role.

Strong hands-on experience with Azure PaaS services, including:

Azure App Services (scaling, health mechanisms, plan configurations)

Azure SQL Database

Cosmos DB

Azure Data Factory

Application Insights (monitoring, alerting, configuration)

Proficiency in .NET, C#, and Web APIs - must be able to read and investigate code to determine whether issues are configuration-related or development-related.

Experience with monitoring, alerting, and observability tooling - ability to configure alerts, set up scale-out/scale-in policies, and build resiliency into platform services.

Understanding of microservices architecture - must be able to understand what individual services do (e.g., user service, contact service) and how they interact.

Incident response and triage experience - comfortable being the first point of contact for production issues, investigating App Service behavior, API health, and infrastructure status, and escalating to development teams as needed.

Strong communication skills - ability to collaborate across SRE, platform engineering, infrastructure, and development teams. AWS experience - the platform is expanding to AWS; prior knowledge of AWS services would be beneficial as the team gears up for multi-cloud support.

Prior SRE experience or development-to-SRE career path - existing SRE team members transitioned from development roles; a similar background would be ideal.

Experience with Elastic (ELK Stack) for logging and observability.

Kubernetes / container orchestration familiarity - while the platform is not fully containerized, some workloads run on Kubernetes clusters and exposure would be helpful.

Experience supporting enterprise-scale platforms with high adoption across multiple business divisions or product lines.

Exposure to SRE process setup, including runbooks, on-call rotations, and incident management frameworks.

Interest or experience in innovative tooling such as AI agents for monitoring or automated remediation.

Role details

Job location

Tech stack

Job description

Requirements

Apply for this position

Good distractions

Moments

Videos View all