Site Reliability Engineer (SRE)

Prisma

4 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Job location

Remote

Tech stack

Amazon Web Services (AWS)

Systems Engineering

Cloud Engineering

Software Quality

Database Storage Structures

Software Debugging

Distributed Systems

DNS

HP Systems Insight Manager

Python

Network Security

PostgreSQL

Performance Tuning

RabbitMQ

Redis

Site Reliability Engineering Practices

Software Engineering

Management of Software Versions

Software Vulnerability Management

Datadog

Pulumi

Mttr

Reliability of Systems

PySpark

Kubernetes

Infrastructure Automation Frameworks

Cloudflare

Kafka

Machine Learning Operations

Terraform

Microservices

Job description

Hands-on Reliability & System Engineering: Design, build, and operate reliable and scalable systems by defining and monitoring SLOs/SLIs, working directly on production infrastructure, and collaborating closely with software engineers on system design and reliability improvements. Automation, Operations & Incident Response: Actively develop automation for infrastructure and operational workflows to eliminate toil and reduce MTTR, participate in and lead incident response, and drive blameless post-incident reviews with concrete follow-ups implemented in code and tooling. Performance, Capacity & Security: Continuously analyze and optimize system performance and cost, provide data, insights, and recommendations to inform capacity planning, and support security best practices through hands-on vulnerability remediation and threat mitigation.

Requirements

SRE & Cloud Engineering: Hands-on experience with SRE practices in production, strong AWS expertise, Kubernetes, networking, DNS, and Infrastructure as Code (Pulumi preferred, Terraform a plus). Automation & Software Engineering: Demonstrate strong software engineering fundamentals with an emphasis on code quality and maintainability. This includes solid Python proficiency and deep knowledge of the Python ecosystem (testing, debugging, packaging) and a consistent focus on writing clean, well-structured, and maintainable code. Reliability, Data & Operations: Add stakeholder engagement and mentoring, e.g., lead incident response and RCAs, improve system reliability, and engage stakeholders to propose solutions, share learnings, and mentor others. Nice-to-Have Regulated Environments & Security: Experience operating in highly regulated industries (e.g., Insurance, Banking, Healthcare), managing sensitive data, and supporting secure networking setups, including exposure to security technologies such as Cloudflare. Distributed Systems & Microservices: Strong understanding of microservices architectures, their principles and trade-offs, with the ability to troubleshoot and maintain distributed systems and supporting technologies (RabbitMQ, Kafka, PostgreSQL, Redis). Observability, Data and MLOps: Hands-on experience with Datadog for platform and application monitoring, performance optimisation, and solid fundamentals in database structures and operational troubleshooting. Hands-on experience with PySpark and familiarity with MLOps practices including model registries, versioning, retraining workflows, and deployment lifecycles.

Benefits & conditions

We want to make Prima a happy and empowering place to work. So if you decide to join us, you can expect plenty of perks. ?? Work Your Way: Enjoy full flexibility - work from home, the office or a mix of both. Plus, work from anywhere for up to 30 days a year. ?? Grow with us: We may move fast at Prima, but we move together. Get access to learning resources, mentorship and a growth plan tailored to you. ?? Thrive and perform: Your best work begins when you feel your best. Enjoy private healthcare, gym discounts, wellbeing programs and mental health support. Think you're a match?

About the company

At Prisma, we are building the data layer for modern applications. If you are fascinated by the leading-edge architecture and technology used in today’s data-intensive, highly scalable software systems, with distributed graph data on a massive scale, but you want the energy, challenges, and freedom that come with working in a small startup, then a job at Prisma might be for you.

With funding from top-tier investors Amplify Partners and Kleiner Perkins, we are a small, distributed team working on making the advanced data infrastructure developed by large tech companies accessible to all application developers around the world. Our hard work is paying off, with adoption and implementation of Prisma by some of the most successful and interesting companies out there today, and the fun is just beginning!

Role details

Job location

Tech stack

Job description

Requirements

Benefits & conditions

About the company

Apply for this position

Good distractions

Moments

Videos View all