Observability Engineer (ESS Platform SME)
Headway Tek Inc
McLean, United States of America
yesterday
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
SeniorJob location
McLean, United States of America
Tech stack
API
Data analysis
Application Performance Management
Continuous Integration
Elasticsearch
Groovy
Python
Openshift
Logstash
Data Logging
Transaction Processing (Computing)
Kubernetes
Drilldown
BIG-IP Access Policy Manager (APM)
Kibana
Dynatrace
Microservices
Job description
ESS Observability Architecture & Implementation
- Design and implement end-to-end observability solutions using ESS (Elastic Stack).
- Build a centralized observability layer covering all MF applications.
- Ensure block-level aggregation with drill-down to:
- Application-level metrics
- APM traces
- Logs and events
- Service dependencies
Dashboard Engineering (Critical Priority)
- Develop and scale a large backlog of ESS dashboards, including but not limited to:
- Cluster Health (OCP/K8s)
- API & APM Dashboards
- Service Health & Dependency Monitoring
- Pod Status / Restart / Scaling Metrics
- HTTP Status Analytics (200/400/500 trends)
- Transaction Processing Metrics
- Infra Metrics (CPU, Memory, Disk, Network)
- Synthetic Monitoring & Availability
- Build intuitive, drill-down dashboards from MF Block Service Application level.
APM, Tracing & Monitoring Expansion
- Expand ESS-based:
- Application Performance Monitoring (APM)
- Distributed tracing
- Real User Monitoring (RUM)
- Synthetic monitoring
- Enable end-to-end traceability across microservices.
Proactive Observability & Alerting
- Design and implement smart alerting rules:
- Move from reactive proactive detection
- Reduce noise, improve signal quality
- Define SLOs, SLIs, and error budgets
- Enhance anomaly detection and trend analysis
Collaboration & Leadership
- Work closely with:
- EOT Observability Team
- Internal CDLs
- Application teams
- Act as ESS Observability SME
- Provide guidance, standards, and best practices
Requirements
- Strong hands-on experience with ESS (Elastic Stack):
- Elasticsearch
- Logstash
- Kibana
- Beats / Elastic Agent
- Elastic APM
- Proven experience building enterprise-scale observability dashboards in ESS
- Deep understanding of:
- Microservices architecture
- Kubernetes / OpenShift (OCP)
- Experience with:
- APM, distributed tracing, logging, metrics correlation
- Ability to design multi-layer observability (infra platform app), + Synthetic monitoring tools integrated with ESS
- Real User Monitoring (RUM)
- Service maps and dependency graphs
- Knowledge of:
- CI/CD observability integration
- Alerting frameworks within Elastic
- Scripting: Python / Shell / Groovy (nice to have), * Strong ownership mindset
- Ability to work under aggressive timelines
- Excellent problem-solving skills
- Clear communication with technical and non-technical teams