Senior Site Reliability Engineer

REMOTE HAND

7 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Compensation

$ 250K

Job location

Remote

Tech stack

Java

Artificial Intelligence

Amazon Web Services (AWS)

Application Release Automation

Continuous Integration

Cursor (Graphical User Interface Elements)

DevOps

Python

Reliability Engineering

TypeScript

Containerization

Kubernetes

Infrastructure Automation Frameworks

Deployment Automation

Software Coding

Terraform

Job description

The Senior Site Reliability Engineer role is focused on enhancing the reliability, scalability, and operational excellence of the platform. This position involves designing and implementing systems for improved observability and incident management, leading significant projects, and collaborating across various engineering teams to build robust platforms and services. The role is critical in establishing standards and driving reliability goals to ensure the platform meets high operational standards. Additionally, this position includes mentoring junior engineers and fostering continuous innovation to maintain and improve the organization''s engineering capabilities.

Responsibilities:

Design and implement systems to improve reliability, observability, traceability, and incident management
Lead projects from discovery to execution, ensuring successful delivery
Collaborate with AI/ML, Data, Platform, and Product engineering teams to develop advanced platforms and services
Define and enforce production standards, processes, and tools for operational excellence
Advocate for and implement SLIs, SLOs, and other reliability metrics across engineering teams
Mentor junior team members to support technical growth and leadership development
Drive continuous improvement by introducing creative solutions and challenging existing processes

Requirements

5+ years of experience in Production Engineering, SRE, Platform Engineering, DevOps, Backend Engineering, or similar roles
Proficient coding skills in at least one language such as Golang, Python, Java, or Typescript
Experience with cloud-native technologies and Infrastructure-as-Code tools like Kubernetes, Terraform, and AWS
Proven track record delivering medium to large-scale projects that improve platform reliability and scalability
Strong understanding of production reliability concepts including SLIs, SLOs, and incident management
Skilled in designing and maintaining CI/CD pipelines, deployment strategies, and release automation
Familiarity with AI-assisted development tools such as Claude Code, Codex, or Cursor
Excellent communication skills for collaborating with technical and non-technical teams
Experience working in dynamic, reliability-focused production environments preferred

Benefits & conditions

Pay Range and Compensation Package:

The US base salary range for this full-time position is $220,000 - $250,000 annually plus equity and benefits

About the company

The organization operates in the AI-driven marketing technology industry, focusing on personalized customer engagement through unified communication channels such as SMS, RCS, email, and push notifications. It addresses the challenge of creating authentic customer relationships by combining advanced AI technology with human expertise to deliver tailored marketing experiences that enhance performance, revenue, and loyalty. The company supports more than 8,000 customers across over 70 industries, including notable global brands, facilitating billions of interactions and generating tens of billions in revenue. With a distributed global workforce and offices in major cities worldwide, this team has received multiple recognitions for its culture and growth.