Site Reliability Engineer
Role details
Job location
Tech stack
Job description
Schedule: Monday to Friday, 8 AM - 5 PM CST Day-to-Day Responsibilities: o Design and implement systems for monitoring, telemetry, and observability. o Ensure applications are reliable, resilient, and self-healing. o Collaborate with development teams to establish best-in-class observability practices. o Build and improve metrics, distributed tracing, logging pipelines, and platform reliability solutions.
Requirements
o 5+ years of experience developing monitoring, telemetry, and observability tools. o Hands-on experience with at least one major cloud provider: AWS, Azure, or GCP. o Strong understanding of cloud-native workloads, including serverless, containers, and managed services. o Expertise with modern observability tools such as Prometheus, Grafana, ELK/EFK Stack, Splunk, New Relic, Dynatrace, Datadog, and OpenTelemetry. o Deep understanding of metrics, logs, and distributed tracing. o Proficiency with productivity tools such as MS Office, Outlook, and technical platforms like Azure.
Benefits & conditions
The Company offers the following benefits for this position, subject to applicable eligibility requirements: medical insurance, dental insurance, vision insurance, 401(k) retirement plan, life insurance, long-term disability insurance, short-term disability insurance, paid parking/public transportation, (paid time, paid sick and safe time, hours of paid vacation time, weeks of paid parental leave, paid holidays annually - AS Applicable)