Application Performance Manager/SRE (AppDynamics)

Insight Global
Downers Grove, United States of America
3 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Downers Grove, United States of America

Tech stack

Java
.NET
Application Performance Management
Automation of Tests
Baselining
Cloud Engineering
Static Program Analysis
Cursor (Graphical User Interface Elements)
DevOps
Distributed Systems
Github
Python
Network Layer
Reliability Engineering
Site Reliability Engineering Practices
Prometheus
Reverse Engineering
Software Engineering
Datadog
Data Logging
Grafana
Backend
Git Flow
Low Latency
Splunk
New Relic (SaaS)
Appdynamics
Dynatrace
Go
Microservices

Job description

The Senior Software Development Engineer - Site Reliability & Application Performance is responsible for ensuring the stability, reliability, and performance of critical applications supporting our Third-Party Administrator (TPA) & Payer Solutions department. This role sits at the intersection of software engineering, operations, and SRE practices, with a strong emphasis on Application Performance Monitoring (APM), observability, and continuous improvement of production systems. The colleague in this role will design and implement scalable, resilient solutions; build and maintain observability capabilities; drive incident reduction; and partner closely with engineering, infrastructure, and support teams to improve end-to-end reliability and customer experience.

Requirements

  • 6+ years of professional experience in software engineering, site reliability engineering, or a closely related discipline.

  • Strong hands-on experience with AppDynamics in production environments (dashboards, health rules, transaction detection, alerting, baselining, war-room usage).

  • Practical experience with SRE practices: SLIs/SLOs, error budgets, incident response, post-incident reviews, and runbooks.

  • Experience with observability tooling and standards, including OpenTelemetry (tracing, metrics, logging) and integration into APM/monitoring platforms.

  • Solid programming skills in one or more languages commonly used in backend or distributed systems (e.g., .NET, Java, Python, Go, or similar; .NET preferred).

  • Utilization of AI coding assistants such as Github Actions, GHCP, Windsurf, or Cursor for code analysis and reverse engineering legacy applications

  • Experience with CI/CD pipelines and modern deployment practices (e.g., Git-based workflows, automated testing and deployment).

  • Strong understanding of distributed systems, microservices, and cloud-native architectures (latency, resiliency, back-pressure, timeouts, circuit breakers).

  • Demonstrated ability to troubleshoot complex production issues across application, infrastructure, and network layers. · Experience with additional APM / monitoring stacks (e.g., Dynatrace, New Relic, Datadog, Prometheus, Grafana, Splunk, ELK, etc.).

  • Background in healthcare, insurance, or other highly regulated environments.

  • Experience mentoring or leading other engineers in an SRE/DevOps context.

Apply for this position