Observability Engineer (ESS Platform SME)

Headway Tek Inc
McLean, United States of America
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

McLean, United States of America

Tech stack

API
Data analysis
Application Performance Management
Continuous Integration
Elasticsearch
Groovy
Python
Openshift
Logstash
Data Logging
Transaction Processing (Computing)
Kubernetes
Drilldown
BIG-IP Access Policy Manager (APM)
Kibana
Dynatrace
Microservices

Job description

ESS Observability Architecture & Implementation

  • Design and implement end-to-end observability solutions using ESS (Elastic Stack).
  • Build a centralized observability layer covering all MF applications.
  • Ensure block-level aggregation with drill-down to:
  • Application-level metrics
  • APM traces
  • Logs and events
  • Service dependencies

Dashboard Engineering (Critical Priority)

  • Develop and scale a large backlog of ESS dashboards, including but not limited to:
  • Cluster Health (OCP/K8s)
  • API & APM Dashboards
  • Service Health & Dependency Monitoring
  • Pod Status / Restart / Scaling Metrics
  • HTTP Status Analytics (200/400/500 trends)
  • Transaction Processing Metrics
  • Infra Metrics (CPU, Memory, Disk, Network)
  • Synthetic Monitoring & Availability
  • Build intuitive, drill-down dashboards from MF Block Service Application level.

APM, Tracing & Monitoring Expansion

  • Expand ESS-based:
  • Application Performance Monitoring (APM)
  • Distributed tracing
  • Real User Monitoring (RUM)
  • Synthetic monitoring
  • Enable end-to-end traceability across microservices.

Proactive Observability & Alerting

  • Design and implement smart alerting rules:
  • Move from reactive proactive detection
  • Reduce noise, improve signal quality
  • Define SLOs, SLIs, and error budgets
  • Enhance anomaly detection and trend analysis

Collaboration & Leadership

  • Work closely with:
  • EOT Observability Team
  • Internal CDLs
  • Application teams
  • Act as ESS Observability SME
  • Provide guidance, standards, and best practices

Requirements

  • Strong hands-on experience with ESS (Elastic Stack):
  • Elasticsearch
  • Logstash
  • Kibana
  • Beats / Elastic Agent
  • Elastic APM
  • Proven experience building enterprise-scale observability dashboards in ESS
  • Deep understanding of:
  • Microservices architecture
  • Kubernetes / OpenShift (OCP)
  • Experience with:
  • APM, distributed tracing, logging, metrics correlation
  • Ability to design multi-layer observability (infra platform app), + Synthetic monitoring tools integrated with ESS
  • Real User Monitoring (RUM)
  • Service maps and dependency graphs
  • Knowledge of:
  • CI/CD observability integration
  • Alerting frameworks within Elastic
  • Scripting: Python / Shell / Groovy (nice to have), * Strong ownership mindset
  • Ability to work under aggressive timelines
  • Excellent problem-solving skills
  • Clear communication with technical and non-technical teams

Apply for this position