Senior Site Reliability Engineer

REMOTE HAND
7 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
$ 250K

Job location

Remote

Tech stack

Java
Artificial Intelligence
Amazon Web Services (AWS)
Application Release Automation
Continuous Integration
Cursor (Graphical User Interface Elements)
DevOps
Python
Reliability Engineering
TypeScript
Containerization
Kubernetes
Infrastructure Automation Frameworks
Deployment Automation
Software Coding
Terraform
Go

Job description

The Senior Site Reliability Engineer role is focused on enhancing the reliability, scalability, and operational excellence of the platform. This position involves designing and implementing systems for improved observability and incident management, leading significant projects, and collaborating across various engineering teams to build robust platforms and services. The role is critical in establishing standards and driving reliability goals to ensure the platform meets high operational standards. Additionally, this position includes mentoring junior engineers and fostering continuous innovation to maintain and improve the organization''s engineering capabilities.

  1. Responsibilities:
  • Design and implement systems to improve reliability, observability, traceability, and incident management

  • Lead projects from discovery to execution, ensuring successful delivery

  • Collaborate with AI/ML, Data, Platform, and Product engineering teams to develop advanced platforms and services

  • Define and enforce production standards, processes, and tools for operational excellence

  • Advocate for and implement SLIs, SLOs, and other reliability metrics across engineering teams

  • Mentor junior team members to support technical growth and leadership development

  • Drive continuous improvement by introducing creative solutions and challenging existing processes

Requirements

  • 5+ years of experience in Production Engineering, SRE, Platform Engineering, DevOps, Backend Engineering, or similar roles

  • Proficient coding skills in at least one language such as Golang, Python, Java, or Typescript

  • Experience with cloud-native technologies and Infrastructure-as-Code tools like Kubernetes, Terraform, and AWS

  • Proven track record delivering medium to large-scale projects that improve platform reliability and scalability

  • Strong understanding of production reliability concepts including SLIs, SLOs, and incident management

  • Skilled in designing and maintaining CI/CD pipelines, deployment strategies, and release automation

  • Familiarity with AI-assisted development tools such as Claude Code, Codex, or Cursor

  • Excellent communication skills for collaborating with technical and non-technical teams

  • Experience working in dynamic, reliability-focused production environments preferred

Benefits & conditions

  1. Pay Range and Compensation Package:
  • The US base salary range for this full-time position is $220,000 - $250,000 annually plus equity and benefits

About the company

The organization operates in the AI-driven marketing technology industry, focusing on personalized customer engagement through unified communication channels such as SMS, RCS, email, and push notifications. It addresses the challenge of creating authentic customer relationships by combining advanced AI technology with human expertise to deliver tailored marketing experiences that enhance performance, revenue, and loyalty. The company supports more than 8,000 customers across over 70 industries, including notable global brands, facilitating billions of interactions and generating tens of billions in revenue. With a distributed global workforce and offices in major cities worldwide, this team has received multiple recognitions for its culture and growth.

Apply for this position