Application Performance Manager/SRE (AppDynamics)
Role details
Job location
Tech stack
Job description
The Senior Software Development Engineer - Site Reliability & Application Performance is responsible for ensuring the stability, reliability, and performance of critical applications supporting our Third-Party Administrator (TPA) & Payer Solutions department. This role sits at the intersection of software engineering, operations, and SRE practices, with a strong emphasis on Application Performance Monitoring (APM), observability, and continuous improvement of production systems. The colleague in this role will design and implement scalable, resilient solutions; build and maintain observability capabilities; drive incident reduction; and partner closely with engineering, infrastructure, and support teams to improve end-to-end reliability and customer experience.
Requirements
-
6+ years of professional experience in software engineering, site reliability engineering, or a closely related discipline.
-
Strong hands-on experience with AppDynamics in production environments (dashboards, health rules, transaction detection, alerting, baselining, war-room usage).
-
Practical experience with SRE practices: SLIs/SLOs, error budgets, incident response, post-incident reviews, and runbooks.
-
Experience with observability tooling and standards, including OpenTelemetry (tracing, metrics, logging) and integration into APM/monitoring platforms.
-
Solid programming skills in one or more languages commonly used in backend or distributed systems (e.g., .NET, Java, Python, Go, or similar; .NET preferred).
-
Utilization of AI coding assistants such as Github Actions, GHCP, Windsurf, or Cursor for code analysis and reverse engineering legacy applications
-
Experience with CI/CD pipelines and modern deployment practices (e.g., Git-based workflows, automated testing and deployment).
-
Strong understanding of distributed systems, microservices, and cloud-native architectures (latency, resiliency, back-pressure, timeouts, circuit breakers).
-
Demonstrated ability to troubleshoot complex production issues across application, infrastructure, and network layers. · Experience with additional APM / monitoring stacks (e.g., Dynatrace, New Relic, Datadog, Prometheus, Grafana, Splunk, ELK, etc.).
-
Background in healthcare, insurance, or other highly regulated environments.
-
Experience mentoring or leading other engineers in an SRE/DevOps context.