Application Performance Manager/SRE (AppDynamics)

Insight Global

Downers Grove, United States of America

1 month ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Downers Grove, United States of America

Tech stack

Java

.NET

Application Performance Management

Automation of Tests

Baselining

Cloud Engineering

Static Program Analysis

Cursor (Graphical User Interface Elements)

DevOps

Distributed Systems

Github

Python

Network Layer

Reliability Engineering

Site Reliability Engineering Practices

Prometheus

Reverse Engineering

Software Engineering

Datadog

Data Logging

Grafana

Backend

Git Flow

Low Latency

Splunk

New Relic (SaaS)

Appdynamics

Dynatrace

Microservices

Job description

The Senior Software Development Engineer - Site Reliability & Application Performance is responsible for ensuring the stability, reliability, and performance of critical applications supporting our Third-Party Administrator (TPA) & Payer Solutions department. This role sits at the intersection of software engineering, operations, and SRE practices, with a strong emphasis on Application Performance Monitoring (APM), observability, and continuous improvement of production systems. The colleague in this role will design and implement scalable, resilient solutions; build and maintain observability capabilities; drive incident reduction; and partner closely with engineering, infrastructure, and support teams to improve end-to-end reliability and customer experience.

Requirements

6+ years of professional experience in software engineering, site reliability engineering, or a closely related discipline.
Strong hands-on experience with AppDynamics in production environments (dashboards, health rules, transaction detection, alerting, baselining, war-room usage).
Practical experience with SRE practices: SLIs/SLOs, error budgets, incident response, post-incident reviews, and runbooks.
Experience with observability tooling and standards, including OpenTelemetry (tracing, metrics, logging) and integration into APM/monitoring platforms.
Solid programming skills in one or more languages commonly used in backend or distributed systems (e.g., .NET, Java, Python, Go, or similar; .NET preferred).
Utilization of AI coding assistants such as Github Actions, GHCP, Windsurf, or Cursor for code analysis and reverse engineering legacy applications
Experience with CI/CD pipelines and modern deployment practices (e.g., Git-based workflows, automated testing and deployment).
Strong understanding of distributed systems, microservices, and cloud-native architectures (latency, resiliency, back-pressure, timeouts, circuit breakers).
Demonstrated ability to troubleshoot complex production issues across application, infrastructure, and network layers. · Experience with additional APM / monitoring stacks (e.g., Dynatrace, New Relic, Datadog, Prometheus, Grafana, Splunk, ELK, etc.).
Background in healthcare, insurance, or other highly regulated environments.
Experience mentoring or leading other engineers in an SRE/DevOps context.

Role details

Job location

Tech stack

Job description

Requirements

Apply for this position

Good distractions

Moments

Videos View all