Elastic Search / ELK Stack engineer - R01564999
Role details
Job location
Tech stack
Job description
We are seeking a highly experienced Senior Observability Engineer with deep expertise in ESS (Elastic Stack) to lead and accelerate the development of enterprise-grade observability capabilities across mission-critical applications. This role requires a hands-on SME who can design, build, and scale observability dashboards, APM, tracing, and monitoring solutions exclusively within ESS. The candidate will play a key role in transforming current monitoring into a proactive, intelligent, and scalable observability ecosystem. This is a high-impact, fast-paced engagement (target, ESS Observability Architecture & Implementation:
-
Design and implement end-to-end observability solutions using ESS (Elastic Stack).
-
Build a centralized observability layer covering all MF applications.
-
Ensure block-level aggregation with drill-down to:
-
Application-level metrics
-
APM traces
-
Logs and events
-
Service dependencies
Dashboard Engineering (Critical Priority):
-
Develop and scale a large backlog of ESS dashboards, including but not limited to:
-
Cluster Health (OCP/K8s)
-
API & APM Dashboards
-
Service Health & Dependency Monitoring
-
Pod Status / Restart / Scaling Metrics
-
HTTP Status Analytics (200/400/500 trends)
-
Transaction Processing Metrics
-
Infra Metrics (CPU, Memory, Disk, Network)
-
Synthetic Monitoring & Availability
-
Build intuitive, drill-down dashboards from MF Block * Service * Application level.
Develop and scale a large backlog of ESS dashboards, including but not limited to:
-
Cluster Health (OCP/K8s)
-
API & APM Dashboards
-
Service Health & Dependency Monitoring
-
Pod Status / Restart / Scaling Metrics
-
HTTP Status Analytics (200/400/500 trends)
-
Transaction Processing Metrics
-
Infra Metrics (CPU, Memory, Disk, Network)
-
Synthetic Monitoring & Availability
-
Build intuitive, drill-down dashboards from MF Block * Service * Application level.
-
APM, Tracing & Monitoring Expansion
-
Expand ESS-based:
-
Application Performance Monitoring (APM)
-
Distributed tracing
-
Real User Monitoring (RUM)
-
Synthetic monitoring
-
Enable end-to-end traceability across microservices.
-
Proactive Observability & Alerting
-
Design and implement smart alerting rules:
-
Move from reactive * proactive detection
-
Reduce noise, improve signal quality
-
Define SLOs, SLIs, and error budgets
-
Enhance anomaly detection and trend analysis
-
Collaboration & Leadership
-
Work closely with:
-
EOT Observability Team
-
Internal CDLs
-
Application teams
-
Act as ESS Observability SME
-
Provide guidance, standards, and best practices
-
Required Skills & Experience:
-
Strong hands-on experience with ESS (Elastic Stack)
Requirements
-
Elasticsearch
-
Logstash
-
Kibana
-
Beats / Elastic Agent
-
Elastic APM
-
Proven experience building enterprise-scale observability dashboards in ESS
-
Deep understanding of:
-
Microservices architecture
-
Kubernetes / OpenShift (OCP)
-
Experience with:
-
APM, distributed tracing, logging, metrics correlation
-
Ability to design multi-layer observability (infra * platform * app)
Strongly Preferred:
-
Experience with:
-
Synthetic monitoring tools integrated with ESS
-
Real User Monitoring (RUM)
-
Service maps and dependency graphs
-
Knowledge of:
-
CI/CD observability integration
-
Alerting frameworks within Elastic
-
Scripting: Python / Shell / Groovy (nice to have)
Additional Notes:
- Candidate must be an ESS expert - alternative tools experience alone will not be sufficient.
- This is a high-priority, business-critical role with immediate impact expectations.