Observability / SRE Engineer
Mpower Plus Rezolve Ai Group Ltd
Charlotte, United States of America
11 days ago
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
EnglishJob location
Charlotte, United States of America
Tech stack
Amazon Web Services (AWS)
Audit Trail
Azure
Cloud Computing
DevOps
Python
Nagios
Reliability Engineering
Prometheus
Datadog
Data Logging
Google Cloud Platform
Grafana
Reliability of Systems
Kubernetes
Terraform
Splunk
New Relic (SaaS)
Appdynamics
Dynatrace
Docker
ELK
Job description
- Design and maintain logging, monitoring, and alerting solutions.
- Implement and manage audit logging for compliance and security requirements.
- Ensure operational monitoring of applications, services, and infrastructure.
- Provide end-to-end traceability of transactions and system activities.
- Create dashboards, alerts, and reporting for production environments.
- Investigate incidents, perform root cause analysis, and improve system reliability.
- Work closely with DevOps, Platform, Cloud, and Application teams to improve observability.
Requirements
- Strong experience with logging platforms such as Splunk, ELK Stack, Graylog, or similar.
- Experience with monitoring and alerting tools such as Datadog, Dynatrace, Prometheus, Grafana, New Relic, or AppDynamics.
- Knowledge of distributed tracing and observability concepts.
- Experience with audit logging, compliance monitoring, and operational traceability.
- Strong troubleshooting and production support experience.
- Experience in SRE, DevOps, Platform Engineering, or Production Engineering environments.
Preferred Skills
- AWS, Azure, or Google Cloud Platform.
- Kubernetes and Docker.
- OpenTelemetry, Jaeger, Zipkin, or similar tracing tools.
- Automation and scripting using Python, Shell, or Terraform.