SRE

Insight Global
Naperville, United States of America
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Naperville, United States of America

Tech stack

.NET
API
Artificial Intelligence
Amazon Web Services (AWS)
Application Performance Management
Application Services
C Sharp (Programming Language)
Software Debugging
SQL Azure
Reliability Engineering
Datadog
Data Logging
Azure
Multi-Cloud
Backend
Kubernetes
Cosmos DB
Azure
ELK
Web Api
Microservices

Job description

We are seeking an experienced Site Reliability Engineer (SRE) to support a large-scale enterprise platform that powers microservices across multiple global business divisions. The platform, built on Azure PaaS components with a .NET/C# backend, serves a growing number of products and consumers.

This SRE will serve as the first point of contact for production issues during US business hours, responsible for investigating incidents, diagnosing whether issues are infrastructure- or code-related, configuring and maintaining monitoring/alerting, and ensuring platform resiliency as adoption scales. The role requires hands-on knowledge of Azure PaaS services, application-level debugging, and the ability to engage development teams when deeper code-level resolution is needed.

The SRE will join a growing reliability team and work closely with platform engineering, development, and infrastructure teams. The platform is not fully containerized and relies heavily on Azure PaaS components, though Kubernetes workloads are part of the broader ecosystem. This individual will report to the SRE team lead and play a critical role in ensuring US-timezone coverage and support continuity.

We are a company committed to creating inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity employer that believes everyone matters. Qualified candidates will receive consideration for employment opportunities without regard to race, religion, sex, age, marital status, national origin, sexual orientation, citizenship status, disability, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to Human Resources Request Form. The EEOC "Know Your Rights" Poster is available here.

We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to HR@insightglobal.com.To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/.

Requirements

5-7 years of professional experience in a reliability, platform operations, or SRE-focused role.

Strong hands-on experience with Azure PaaS services, including:

Azure App Services (scaling, health mechanisms, plan configurations)

Azure SQL Database

Cosmos DB

Azure Data Factory

Application Insights (monitoring, alerting, configuration)

Proficiency in .NET, C#, and Web APIs - must be able to read and investigate code to determine whether issues are configuration-related or development-related.

Experience with monitoring, alerting, and observability tooling - ability to configure alerts, set up scale-out/scale-in policies, and build resiliency into platform services.

Understanding of microservices architecture - must be able to understand what individual services do (e.g., user service, contact service) and how they interact.

Incident response and triage experience - comfortable being the first point of contact for production issues, investigating App Service behavior, API health, and infrastructure status, and escalating to development teams as needed.

Strong communication skills - ability to collaborate across SRE, platform engineering, infrastructure, and development teams. AWS experience - the platform is expanding to AWS; prior knowledge of AWS services would be beneficial as the team gears up for multi-cloud support.

Prior SRE experience or development-to-SRE career path - existing SRE team members transitioned from development roles; a similar background would be ideal.

Experience with Elastic (ELK Stack) for logging and observability.

Kubernetes / container orchestration familiarity - while the platform is not fully containerized, some workloads run on Kubernetes clusters and exposure would be helpful.

Experience supporting enterprise-scale platforms with high adoption across multiple business divisions or product lines.

Exposure to SRE process setup, including runbooks, on-call rotations, and incident management frameworks.

Interest or experience in innovative tooling such as AI agents for monitoring or automated remediation.

Apply for this position