{"@context":"https://schema.org","@graph":[{"@context":"https://schema.org/","@type":"JobPosting","@id":"#jobPosting","title":"Site Reliability Engineer

Cryptio
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Remote

Tech stack

Amazon Web Services (AWS)
Amazon Web Services (AWS)
Databases
Data Systems
Software Debugging
DevOps
Distributed Systems
Identity and Access Management
PostgreSQL
Reliability Engineering
Prometheus
TypeScript
Openapi
Pulumi
Autoscaling
React
Delivery Pipeline
Grafana
Mttr
Gitlab-ci
Kubernetes
Cassandra
Route53
NestJS
Docker
Microservices

Job description

We're hiring a Senior Site Reliability Engineer (SRE) to take full ownership of Cryptio's reliability, observability, and incident response. You'll work across our stack-from AWS infrastructure to Rust microservices, TypeScript indexers, and data-heavy backends-to ensure our platform remains fast, stable, and resilient as we scale.

This is a role for a hands-on builder who can see across systems, trace complex issues, and design reliability into everything we ship. You'll collaborate closely with engineering and product teams to define SLAs / SLOs, strengthen monitoring and alerting, improve incident management, and build the processes and tooling that make reliability a shared culture at Cryptio.

Key technologies

AWS (EKS, S3, GuardDuty, Route53, IAM, and more)

Rust, TypeScript (Nest.js, React, OpenAPI)

PostgreSQL, Cassandra, ClickHouse

Pulumi, GitLab CI, Docker, Kubernetes

Grafana, Prometheus, Loki, Jaeger

What you'll do

Own reliability end-to-end : design, measure, and improve service availability, latency, and performance across Cryptio's platform

Enhance observability : expand and refine metrics, logs, and traces to provide deep insight into our Rust and TypeScript services

Lead incident management : define playbooks, improve response workflows, and foster a blameless postmortem culture

Strengthen infrastructure : optimise AWS configurations, CI / CD pipelines, autoscaling, and networking for reliability and cost efficiency

Collaborate across teams : work with product and engineering leads to ensure reliability is considered at every design stage

Drive continuous improvement : identify systemic weaknesses, automate recovery where possible, and reduce MTTR across the stack

Champion SRE best practices : guide teams on capacity planning, runbooks, and resilience testing

Requirements

5+ years of experience in Site Reliability, DevOps, or Infrastructure Engineering roles

Deep understanding of distributed systems and debugging at the network, application, and database layers

Hands-on experience with AWS, container orchestration (Kubernetes, ECS), and Infrastructure-as-Code tools (Pulumi or similar)

Comfortable tracing through Rust and TypeScript code to diagnose complex performance or reliability issues

Experience with (or willingness to learn) Cassandra and ClickHouse in production

Strong collaborator with excellent communication skills

Systematic, analytical, and passionate about building reliable systems at scale

Interest in (or curiosity about) crypto, finance, or large-scale data systems

Benefits & conditions

True ownership of reliability and uptime across a critical, fast-growing SaaS platform

Opportunity to shape SRE culture and processes from the ground up

Work with a world-class engineering team at the intersection of crypto, accounting, and data infrastructure

Freedom to experiment and improve observability, alerting, and recovery pipelines end-to-end

100% remote (UK only) , with opportunities to visit our Paris or London hubs

Competitive salary and full benefits package

Interview process

Talent Screen (15-30 min) : Initial call to discuss your background, Cryptio, and the role

Technical Interview (60 min) : Deep dive into reliability, AWS, and debugging scenarios

Team Interview (45 min) : Meet an engineer and product manager to explore cross-team collaboration

CTO Interview (45 min) : Discussion about technical strategy, ownership, and your vision for reliability at Cryptio

Apply for this position