Staff Site Reliability Engineer I
Role details
Job location
Tech stack
Job description
As a Staff SRE at Remote, you will own the technical direction of our SRE platform, shaping its architecture, reliability strategy, and long-term evolution. This is a leadership role as much as a technical one: you'll drive platform-wide initiatives, set the reliability bar for engineering teams across the organisation, and be a force multiplier for the engineers around you.
A key part of this role is identifying and leading opportunities to leverage AI: from reducing operational toil to enabling engineering teams to build, ship, and operate software more effectively. You'll work with a high degree of autonomy, translating technical risks into business impact and aligning with Engineering Managers, Team Leads, and Product teams to ensure reliability and engineering efficiency are built into everything we do., * Own the technical direction of Remote's SRE/Platform domain, its architecture, tooling, and long-term roadmap
- Define and drive the reliability strategy across the platform: SLOs/SLIs, error budgets, observability, and incident management maturity
- Lead complex, cross-team infrastructure initiatives from discovery through delivery, delegating effectively and keeping projects aligned with business goals
- Identify and lead AI enablement initiatives across the engineering organisation, exploring where AI can reduce operational overhead, accelerate development workflows, improve incident response, and unlock new capabilities for engineering teams
- Drive AI-powered automation for platform operations: intelligent alerting, automated incident triage, self-healing infrastructure, and AI-assisted runbooks, reducing toil and freeing engineers to focus on higher-leverage work
- Contribute to capacity planning and cost-efficiency of Remote's infrastructure
- Mentor senior engineers, raising the technical bar through code reviews, design feedback, and hands-on guidance
- Collaborate with the Security team on platform hardening, threat mitigation, and compliance
- Be a steward of engineering quality across the SRE team, championing best practices, managing technical debt deliberately, and raising standards over time
- Contribute to hiring, onboarding, and continuously improving how the SRE team operates
Requirements
- 8+ years of experience in Site Reliability Engineering, DevOps, or Platform Engineering
- Deep expertise in Kubernetes: operating, designing, and scaling production clusters
- Proven experience designing and managing cloud infrastructure on AWS (or other cloud providers) at scale
- Strong infrastructure-as-code practice with Terraform
- Experience defining and operating reliability frameworks: SLOs, SLIs, error budgets, alerting strategies
- Solid observability background: Datadog, Grafana/Prometheus, or similar
- Proficiency with CI/CD platforms (GitLab CI, GitHub Actions, or similar) and deployment automation
- Comfortable with Bash and scripting for automation; broader programming skills are a plus
- Experience with container tooling (Docker) and the broader ecosystem around it
- Curiosity and practical experience applying AI tools to infrastructure, operations, or developer tooling: whether through AI-assisted automation, LLM-powered workflows, or intelligent observability
Leadership & behavioural
- Proven track record of driving platform-wide technical initiatives and influencing engineering direction without formal authority
- Strong communicator: able to tailor messaging to technical and non-technical audiences, write clearly, and align stakeholders across teams
- Self-directed: able to identify what needs attention, define the path forward, and execute with minimal supervision
- Experience mentoring senior engineers and creating space for others to lead and grow
- Comfortable navigating ambiguity, translating vague requirements into concrete solutions
- Approaches technical problems with a business lens, understands the cost and value of engineering decisions
Nice to have
- Excellent communication and interpersonal skills
- Holistic debugging skills
- Security knowledge and capabilities from a defensive and offensive standpoint
Benefits & conditions
Remote's Total Rewards philosophy is to ensure fair, unbiased compensation and fair equity pay along with competitive benefits in all locations in which we operate. We do not agree to or encourage cheap-labor practices and therefore we ensure to pay above in-location rates. We hope to inspire other companies to support global talent-hiring and bring local wealth to developing countries.
At first glance our salary bands seem quite wide - here is some context. At Remote we have international operations and a globally distributed workforce. We use geo ranges to consider geographic pay differentials as part of our global compensation strategy to remain competitive in various markets while we hiring globally.
The base salary range for this full-time position is $ 188,550 to $ 212,150 . Our salary ranges are determined by role, level and location, and our job titles may span more than one career level. The actual base pay for the successful candidate in this role is dependent upon many factors such as location, transferable or job-related skills, work experience, relevant training, business needs, and market demands. The base salary range may be subject to change.
At Remote, we foster internal mobility as a key element of our culture of employee growth and development, supported by a compensation philosophy that guarantees pay equity and fairness. Therefore, all compensation changes associated with an internal move will be reviewed by the Total Rewards & People Enablement team on a case by case basis., Our full benefits & perks are explained in our handbook at remote.com/r/benefits. As a global company, each country works differently, but some benefits/perks are for all Remoters:
- work from anywhere
- flexible paid time off
- flexible working hours (we are async)
- 16 weeks paid parental leave
- mental health support services
- stock options
- learning budget
- home office budget & IT equipment
- budget for local in-person social events or co-working spaces