Observability / SRE Engineer

Mpower Plus Rezolve Ai Group Ltd
Charlotte, United States of America
11 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Charlotte, United States of America

Tech stack

Amazon Web Services (AWS)
Audit Trail
Azure
Cloud Computing
DevOps
Python
Nagios
Reliability Engineering
Prometheus
Datadog
Data Logging
Google Cloud Platform
Grafana
Reliability of Systems
Kubernetes
Terraform
Splunk
New Relic (SaaS)
Appdynamics
Dynatrace
Docker
ELK

Job description

  • Design and maintain logging, monitoring, and alerting solutions.
  • Implement and manage audit logging for compliance and security requirements.
  • Ensure operational monitoring of applications, services, and infrastructure.
  • Provide end-to-end traceability of transactions and system activities.
  • Create dashboards, alerts, and reporting for production environments.
  • Investigate incidents, perform root cause analysis, and improve system reliability.
  • Work closely with DevOps, Platform, Cloud, and Application teams to improve observability.

Requirements

  • Strong experience with logging platforms such as Splunk, ELK Stack, Graylog, or similar.
  • Experience with monitoring and alerting tools such as Datadog, Dynatrace, Prometheus, Grafana, New Relic, or AppDynamics.
  • Knowledge of distributed tracing and observability concepts.
  • Experience with audit logging, compliance monitoring, and operational traceability.
  • Strong troubleshooting and production support experience.
  • Experience in SRE, DevOps, Platform Engineering, or Production Engineering environments.

Preferred Skills

  • AWS, Azure, or Google Cloud Platform.
  • Kubernetes and Docker.
  • OpenTelemetry, Jaeger, Zipkin, or similar tracing tools.
  • Automation and scripting using Python, Shell, or Terraform.

Apply for this position