Senior Site Reliability Engineer

Restb

Barcelona, Spain

7 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Job location

Remote

Barcelona, Spain

Tech stack

Amazon Web Services (AWS)

Cloud Computing

Cloud Engineering

Continuous Integration

DevOps

Distributed Systems

Github

Monitoring of Systems

Python

Software Architecture

Reliability Engineering

Software Engineering

Data Logging

Load Balancing

Delivery Pipeline

Amazon Web Services (AWS)

Containerization

Gitlab-ci

Cloudwatch

Docker

Jenkins

Job description

Own the AWS Cloud Infrastructure: Design, implement, and manage highly reliable, scalable, and cost-efficient services utilizing core AWS tools (e.g., EC2, ECS/EKS, Lambda, RDS, S3, CloudWatch).
Drive Operational Excellence: Implement and maintain robust CI/CD pipelines, automating infrastructure deployment and configuration.
Enhance System Observability: Establish comprehensive monitoring, logging, and alerting strategies to proactively identify and resolve performance and reliability issues.
SRE/DevOps Collaboration: Work closely with the Software Development teams to define and enforce Service Level Objectives and improve the entire service lifecycle, from design through deployment.
Platform Leadership: Provide technical vision and mentorship in discussions around software architecture, infrastructure scaling, and the roadmap for future platform development.
Incident Response: Lead and participate in incident response, root cause analysis (RCA), and continuous improvement processes to minimize downtime and prevent recurrence.

Requirements

Do you have experience in Software development?, We are looking for a Senior Site Reliability Engineer who is passionate about technology and always looking for new ways to tackle complex issues. As our SRE, you would be in charge of our production infrastructure, focusing on reliability, performance, observability, and cost-efficiency.

You'll be a key player in ensuring our state-of-the-art AI solutions are delivered with five-nines reliability, driving a culture of automation and infrastructure-as-code. This role is envisioned as a future leader for a Platform Engineering team, and we expect you to contribute to strategic, long-term technical direction.

The usual day-to-day tasks include designing and deploying cloud architecture, automating deployments (CI/CD), enhancing system monitoring and alerting, and troubleshooting complex production issues.

The role will be completely in English, and CVs/resumes in other languages will not be considered., A strong candidate will ideally possess deep expertise in the following areas:

Senior-Level AWS Proficiency: Extensive, hands-on experience designing, deploying, and managing complex, production-grade workloads on AWS.
Expertise in Python: Knowledge of Python for scripting, automation, and building SRE tools and services.
Containerization and Orchestration: Deep understanding and hands-on experience with Docker in a production environment.
CI/CD Pipeline Design: Experience designing, maintaining, and troubleshooting automated delivery pipelines (e.g., GitHub Actions, GitLab CI, Jenkins, AWS CodePipeline).
Monitoring & Observability: Strong experience with monitoring stacks and centralized logging (e.g. OpenSearch).
Networking and Security: Solid understanding of cloud networking (VPC, security groups, load balancers) and security best practices.
Troubleshooting: Expert ability to diagnose and resolve complex issues across distributed systems.

About the company

Are you a Senior Site Reliability Engineer looking for an opportunity to build and scale the critical infrastructure that powers cutting-edge AI solutions? At Restb.ai, we specialize in industry-specific visual recognition, helping businesses unlock powerful insights through AI-driven automation. We don't just recognize objects (that's so 2016), we teach computers to understand intangible visual concepts like "this room has natural light." By joining our team, you will play a critical role in designing, implementing, and maintaining our highly available, scalable, and secure production systems on AWS. If you thrive on solving complex operational challenges, driving automation through code, and have a passion for operational excellence, we'd love to meet you!, * Make an Impact: Your work will directly shape AI-driven real estate technology. * Future Leadership Track: This role offers a direct path to leading our future Platform Engineering team. * Innovation-Driven Culture: Work with cutting-edge AI and platform technologies. * Be part of an international company that works with some of the largest companies in the world. * Thrive in an open environment with young, driven, and dynamic team members. * Free in-office snacks, beverages, hot drinks, and salads. * Free team lunch every Wednesday. * Free in-office physical therapy sessions. * Quarterly team-building event. * Career development training. * Hybrid Office/remote working. * Health insurance. * Barcelona-based, working worldwide!! Do we need to say more?