Site Reliability Engineer II
Role details
Job location
Tech stack
Job description
As a Site Reliability Engineer, you will be a hands-on engineer who stands up infrastructure, debugs it when it misbehaves, and turns both activities into repeatable, self-service patterns for the engineering teams we support. The ability to provision infrastructure and debug complex distributed systems is our highest concern for this role. CI/CD expertise sits close behind - while observability knowledge is useful but increasingly centralized via shared platforms.
You will work in a small, focused team and in close partnership with the product engineering squads you enable. Expect to rotate between greenfield infrastructure build-outs, pipeline and tooling improvements, and the occasional deep incident investigation.
What You'll Do:
- Stand up and evolve AWS infrastructure using Terraform, with a strong bias toward reusable modules and paved-road patterns over bespoke solutions.
- Operate and improve Kubernetes-based workloads - deployments, scaling, networking, and the platform glue that makes them boring to run.
- Build and maintain CI/CD pipelines (GitHub Actions preferred) that give engineering teams fast, safe, auditable paths to production.
- Partner with product squads to enable self-service access to their own infrastructure, databases, and pipelines - with the guardrails, auditability, and standards that make that access safe.
- Debug production issues across the stack: networking, DNS, certificates, container orchestration, CI pipelines, and application-level behaviour.
- Instrument services with appropriate logs, metrics, and traces, and help teams adopt the observability standards our platform teams define.
- Contribute to runbooks, automation, and standards that reduce toil and one-off work - if you fix it twice, codify it the third time.
- Use AI tooling pragmatically and at a professional level: to accelerate code generation, infrastructure design, debugging, documentation, and review. We expect engineers to be AI-first in how they approach their craft., Mimecast is an AI-first engineering organization. We expect every engineer to use modern AI tools (coding assistants, reasoning models, agentic tools) as a core part of their daily workflow - for design, implementation, debugging, review, and documentation. For this role specifically, we are looking for:
- Demonstrable experience prompting and working with AI coding and reasoning tools to produce real, shipped work.
- Good judgement about when AI output is trustworthy and when it needs verification - particularly for infrastructure code and production-impacting changes.
- An interest in applying AI tooling to SRE problems: incident triage, runbook generation, log analysis, Terraform authoring, and similar.
Ways of Working
- You collaborate well across teams - you can work with product engineers, platform teams, and security peers without friction.
- You have a bias for action and problem-solving, and you prefer shipping small, frequent improvements over big-bang changes.
- You are comfortable saying "let's standardize this" rather than building a one-off, even when the one-off would be faster today.
- You communicate clearly in writing - our teams are distributed, and async clarity matters.
Our Hybrid Model:
We provide you with the flexibility to live balanced, healthy lives through our hybrid working model that champions both collaborative teamwork and individual flexibility. Employees are expected to come to the office at least two days per week, because working together in person:
- Fosters a culture of collaboration, communication, performance and learning.
- Drives innovation and creativity within and between teams.
- Introduces employees to priorities outside of their immediate realm.
- Ensures important interpersonal relationships and connections with one another and our community!, Due to certain obligations to our customers, an offer of employment will be subject to your successful completion of applicable background checks, conducted in accordance with local law.
Requirements
- Hands-on Kubernetes experience - you can deploy, operate, and debug workloads on Kubernetes. You do not need to be an expert, but you should be comfortable in it.
- Terraform experience - you have written and maintained non-trivial Terraform. Again, expertise is not required; pragmatic competence is.
- Familiarity with setting up CI and deployment automation. Experience with GitHub Actions is strongly preferred; experience with Jenkins, GitLab CI, or similar is transferable.
- A working understanding of observability fundamentals - logs, metrics, and traces - and how they are used during incident response.
- Networking knowledge at a practical level: security groups, TLS certificates, DNS, load balancing, and how traffic actually flows through a cloud environment.
- Experience working with relational databases is helpful - PostgreSQL is ideal but not required.
- Experience with AWS or similar cloud platforms
Benefits & conditions
The base salary range for this position is $148,000-$222,000 plus benefits. This range represents the minimum and maximum new hire compensation for this role. The position may also be eligible for incentive plans and additional benefits, in accordance with company policy and local regulations. Our salary ranges are determined by role, level, and location with individual compensation also dependent on factors such as qualifications, experience, and skills. Final offers will reflect these considerations and may vary accordingly.