Senior DevOps / Site Reliability Engineer (full time)
Role details
Job location
Tech stack
Job description
- Designed to be portable across clouds, it's built on primitives like VMs, object storage, load balancers, and managed DBs (e.g., EC2, S3, RDS/OpenSearch).
- Deployed with Ansible reinforced with Python and custom tooling.
- External tools include Datadog, Redis, Elasticsearch, Nessus, and more.
- The platform supports any Dockerized app, which means we often get to explore new stacks, languages, and frameworks to help clients.
Enterprise AWS Platform
- This environment is almost the opposite: purely AWS, highly serverless.
- Static frontends built with Gatsby and Storyblok, enhanced with Lambda APIs.
- Infrastructure as Code is written in AWS CDK (Typescript).
- Other tools in the mix: GitHub Actions, Cloudflare, DynamoDB, API Gateway, etc.
It's a landscape that requires flexibility: jumping between stacks, mindsets, and even languages! If you love variety, you'll thrive here.
What's the job?
As a Site Reliability Engineer (SRE) at Divio, you'll take care of the reliability, performance, security, and cost-effectiveness of both infrastructures. You won't be doing it alone, but you will have autonomy and influence.
Your week-to-week will involve:
- Keeping the lights on: patching, tuning, incident response, and keeping the stack healthy
- Pushing long-term improvements: migrations, internal tooling, security hardening, monitoring revamps
- Shaping the infrastructure: evolving our setup to stay modern, secure, and developer-friendly
- helping: our internal support crew for technical questions or the external dev team with their day-to-day
We aim to balance quick fixes and deep refactors so there's always something meaningful to work on.
Our workflows:
- On the Divio Cloud side, we work in 2-week sprints guided by quarterly OKRs. It's a flexible, engineer-led process where priorities are set together. We keep things lightweight and adaptive, and rotate support/on-call weekly across the team.
- The Enterprise AWS project follows a more structured 3-week Scrum cycle, with regular planning and retrospectives. It involves tighter coordination with the client and an external development team.
Requirements
Do you have experience in TypeScript?, Do you have a Bachelor's degree?, * Has solid experience in infrastructure and software (Python and Typescript are our main tools)
- Enjoys switching contexts and solving real-world problems
- Doesn't mind complexity, and even enjoys taming it
- Can work independently but wants to build with a team
- Has ideas and wants a say in how things evolve
Then you'll probably feel at home with us.
We're not looking for someone who knows everything, just someone who's curious, reliable, and ready to grow with us., Must have:
- Excellent command of written and spoken English
- Solid expertise in Docker, AWS, and cloud-native infrastructure
- Programming experience, ideally in Python and TypeScript (but Go, Java, etc. also welcome)
- Experience with configuration management and Infrastructure as Code, ideally Ansible and AWS CDK
- Strong foundational knowledge (Linux, networking, the TCP/IP stack, load balancing, etc.)
- A reliable and proactive mindset, you take responsibility and can work independently
- Comfort with support tasks and professionalism when communicating with clients
Nice to have:
- Hands-on Linux system administration and tuning experience
- Familiarity with Django or other Python web frameworks
- Experience using AWS CDK, specifically with TypeScript
- Operational knowledge of services like PostgreSQL, Redis, RabbitMQ, Elasticsearch
- Any experience with the tools and technologies mentioned in our stack