Site Reliability Engineer

Everforth Apex
Plano, United States of America
3 days ago

Role details

Contract type
Temporary contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Plano, United States of America

Tech stack

.NET
Linux
File Systems
IBM Cloud Computing
Tivoli Management Framework
JSON
Python
Openshift
Powershell
Prometheus
Runbook
Shell Script
Systems Integration
Scripting (Bash/Python/Go/Ruby)
Grafana
GIT
Kubernetes
Integration Frameworks
Kafka
REST
Splunk
Webhooks
Dynatrace
ServiceNow

Job description

  • Define and implement monitoring and observability coverage for the Event Management platform.
  • Establish standards for metrics, logs, traces, events, synthetic checks, and platform telemetry.
  • Build monitoring for IBM Cloud Pak for Watson AIOps, Netcool OMNIbus, Netcool Impact, OpenShift, Linux, Kafka-based services, and ServiceNow integration points.
  • Design and maintain Dynatrace monitoring for applications, infrastructure, synthetic checks, and platform dependencies.
  • Design and maintain Splunk searches, dashboards, alerts, log onboarding patterns, and operational views.
  • Create OpenShift and Kubernetes monitoring using available platform metrics, Prometheus, and Grafana.
  • Monitor Linux-based platform components, including processes, services, file systems, and resource utilization.
  • Monitor Kafka-based integrations, including topic health, consumer lag, and message throughput.
  • Provide end-to-end visibility for event flow from platform ingestion through downstream integration.
  • Develop runbooks, troubleshooting guides, validation procedures, and operational documentation.

Requirements

  • Hands-on experience with Dynatrace for infrastructure, application, synthetic, service, and dependency monitoring.
  • Hands-on experience with Splunk, including Search Processing Language (SPL), dashboards, alerts, and field extraction.
  • Understanding of OpenShift or Kubernetes monitoring concepts.
  • Experience monitoring Linux-based services, processes, logs, file systems, and resource utilization.
  • Experience defining monitoring coverage for distributed platforms and integration services.
  • Experience with REST APIs, JSON, webhooks, and system-to-system integrations.
  • Experience with scripting or automation using Python, shell scripting, or PowerShell.
  • Ability to troubleshoot issues across application, infrastructure, platform, and integration layers.
  • Strong documentation skills for runbooks, monitoring standards, and support procedures.

Preferred Qualifications

  • Experience with IBM Cloud Pak for Watson AIOps.
  • Experience with IBM Netcool OMNIbus, including ObjectServer, probes, and gateways.
  • Experience with Netcool Impact, including event enrichment and policy logic.
  • Experience with Prometheus and Grafana.
  • Experience monitoring Kafka, including consumer lag, topic health, and broker health.
  • Experience with ServiceNow event, incident, or integration workflows.
  • Experience monitoring .NET applications and services.
  • Experience with distributed tracing and OpenTelemetry.
  • Experience with Git, CI/CD pipelines, and monitoring-as-code or configuration-as-code.
  • Familiarity with production change management and regulated enterprise environments.

About the company

Everforth Apex is a world-class IT services company that serves thousands of clients across the globe. When you join Everforth Apex, you become part of a team that values innovation, collaboration, and continuous learning. We offer quality career resources, training, certifications, development opportunities, and a comprehensive benefits package. Our commitment to excellence is reflected in many awards, including ClearlyRateds Best of Staffing in Talent Satisfaction in the United States and Great Place to Work in the United Kingdom and Mexico. Everforth Apex uses a virtual recruiter as part of the application process. Click for more details. By applying for this job, you agree to receive calls, AI-generated calls, text messages, or emails from Everforth Apex and its affiliates, and contracted partners. Frequency varies for text messages. Message and data rates may apply. Carriers are not liable for delayed or undelivered messages. You can reply STOP to cancel and HELP for help. You can access our privacy policy at

Apply for this position