Site Reliability Engineer II

Mimecast North America, Inc.

Columbus, United States of America

2 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Compensation

$ 222K

Job location

Columbus, United States of America

Tech stack

Artificial Intelligence

Amazon Web Services (AWS)

Audit Trail

Cloud Computing

Code Generation

Databases

Continuous Integration

Relational Databases

Software Debugging

Distributed Systems

DNS

Github

PostgreSQL

Log Analysis

Reliability Engineering

Working Model 2D

Load Balancing

Computer Network Technologies

Gitlab-ci

Kubernetes

Deployment Automation

Terraform

Jenkins

Job description

As a Site Reliability Engineer, you will be a hands-on engineer who stands up infrastructure, debugs it when it misbehaves, and turns both activities into repeatable, self-service patterns for the engineering teams we support. The ability to provision infrastructure and debug complex distributed systems is our highest concern for this role. CI/CD expertise sits close behind - while observability knowledge is useful but increasingly centralized via shared platforms.

You will work in a small, focused team and in close partnership with the product engineering squads you enable. Expect to rotate between greenfield infrastructure build-outs, pipeline and tooling improvements, and the occasional deep incident investigation.

What You'll Do:

Stand up and evolve AWS infrastructure using Terraform, with a strong bias toward reusable modules and paved-road patterns over bespoke solutions.
Operate and improve Kubernetes-based workloads - deployments, scaling, networking, and the platform glue that makes them boring to run.
Build and maintain CI/CD pipelines (GitHub Actions preferred) that give engineering teams fast, safe, auditable paths to production.
Partner with product squads to enable self-service access to their own infrastructure, databases, and pipelines - with the guardrails, auditability, and standards that make that access safe.
Debug production issues across the stack: networking, DNS, certificates, container orchestration, CI pipelines, and application-level behaviour.
Instrument services with appropriate logs, metrics, and traces, and help teams adopt the observability standards our platform teams define.
Contribute to runbooks, automation, and standards that reduce toil and one-off work - if you fix it twice, codify it the third time.
Use AI tooling pragmatically and at a professional level: to accelerate code generation, infrastructure design, debugging, documentation, and review. We expect engineers to be AI-first in how they approach their craft., Mimecast is an AI-first engineering organization. We expect every engineer to use modern AI tools (coding assistants, reasoning models, agentic tools) as a core part of their daily workflow - for design, implementation, debugging, review, and documentation. For this role specifically, we are looking for:
Demonstrable experience prompting and working with AI coding and reasoning tools to produce real, shipped work.
Good judgement about when AI output is trustworthy and when it needs verification - particularly for infrastructure code and production-impacting changes.
An interest in applying AI tooling to SRE problems: incident triage, runbook generation, log analysis, Terraform authoring, and similar.

Ways of Working

You collaborate well across teams - you can work with product engineers, platform teams, and security peers without friction.
You have a bias for action and problem-solving, and you prefer shipping small, frequent improvements over big-bang changes.
You are comfortable saying "let's standardize this" rather than building a one-off, even when the one-off would be faster today.
You communicate clearly in writing - our teams are distributed, and async clarity matters.

Our Hybrid Model:

We provide you with the flexibility to live balanced, healthy lives through our hybrid working model that champions both collaborative teamwork and individual flexibility. Employees are expected to come to the office at least two days per week, because working together in person:

Fosters a culture of collaboration, communication, performance and learning.
Drives innovation and creativity within and between teams.
Introduces employees to priorities outside of their immediate realm.
Ensures important interpersonal relationships and connections with one another and our community!, Due to certain obligations to our customers, an offer of employment will be subject to your successful completion of applicable background checks, conducted in accordance with local law.

Requirements

Hands-on Kubernetes experience - you can deploy, operate, and debug workloads on Kubernetes. You do not need to be an expert, but you should be comfortable in it.
Terraform experience - you have written and maintained non-trivial Terraform. Again, expertise is not required; pragmatic competence is.
Familiarity with setting up CI and deployment automation. Experience with GitHub Actions is strongly preferred; experience with Jenkins, GitLab CI, or similar is transferable.
A working understanding of observability fundamentals - logs, metrics, and traces - and how they are used during incident response.
Networking knowledge at a practical level: security groups, TLS certificates, DNS, load balancing, and how traffic actually flows through a cloud environment.
Experience working with relational databases is helpful - PostgreSQL is ideal but not required.
Experience with AWS or similar cloud platforms

Benefits & conditions

The base salary range for this position is $148,000-$222,000 plus benefits. This range represents the minimum and maximum new hire compensation for this role. The position may also be eligible for incentive plans and additional benefits, in accordance with company policy and local regulations. Our salary ranges are determined by role, level, and location with individual compensation also dependent on factors such as qualifications, experience, and skills. Final offers will reflect these considerations and may vary accordingly.

About the company

Relentless protection. Resilient world. Mimecast was born in 2003 with a focus on delivering relentless protection. Each day, we take on cyber disruption for our tens of thousands of customers around the globe; always putting them first, and never giving up on tackling their biggest security challenges together. We continuously invest to thoughtfully integrate brand protection, security awareness training, web security, compliance, and other essential capabilities. Mimecast is here to help protect large and small organizations from malicious activity, human error, and technology failure - and to lead the movement toward building a more resilient world. The Team You will join our Site Reliability Engineering function within the Governance Compliance and Insights (GCI) pillar, a team that sits at the centre of how Mimecast builds, ships, and operates its cloud platform. We do three things, and we do them at scale: * Cloud infrastructure - design, provision, and evolve the AWS foundations our product teams build on. * CI/CD - build and maintain the pipelines that move code safely and frequently from commit to production. * Observability - provide the logs, metrics, and traces that make our platform debuggable and trustworthy. We hold a few beliefs about how that work should be done: * Continuous deployment. Smaller chunks of work, shipped more frequently, with the safety rails to make that the default - not the exception. * Builders own what they build. We work relentlessly to enable product teams to safely and auditably access their own infrastructure, databases, and pipelines. Our job is to help teams help themselves, not to become a bottleneck. * Process and standardization are paramount. We avoid one-offs and special cases at almost any upfront cost, because repeatability compounds. AI-First Engineering at Mimecast Mimecast is an AI-First engineering organization. Our teams actively leverage AI-powered development tools across all facets of engineering, from code development to testing, documentation, and operations. We're looking for leaders who don't just use AI tools but champion their adoption and establish new ways of working. Our AI leadership extends beyond how we build to what we build. Our Mihra AI agent delivers 7x faster threat response for customers, and we're recognized as "Agents of Change" in Human Risk Management. Engineers here work at the intersection of cutting-edge AI tooling and AI-powered security products that protect organizations worldwide.

Role details

Job location

Tech stack

Job description

Requirements

Benefits & conditions

About the company

Apply for this position

Good distractions

Moments

Videos View all