Site Reliability Engineer (SRE)

Hydrolix

3 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Shift work

Languages

English

Job location

Tech stack

Amazon Web Services (AWS)

Azure

Cloud Computing

Computer Programming

Databases

Software Debugging

Linux

Distributed Systems

Python

PostgreSQL

Performance Tuning

Reliability Engineering

Prometheus

Software Engineering

SQL Databases

Google Cloud Platform

Cloud Platform System

Grafana

Kubernetes

Kibana

Ci Server

Job description

a highly reliable fleet of Kubernetes clusters and Hydrolix deployments across multiple cloud platforms. Service Optimization : Design, implement, and maintain systems and processes to enhance the reliability, availability, and performance of our services. CI/CD Management : Build and optimize CI/CD tools and processes to ensure efficient and reliable deployments. Monitoring and Incident Response : Develop and manage monitoring, alerting, and incident response strategies to minimize downtime and enable rapid recovery. Root Cause Analysis : Conduct comprehensive root cause analyses for system failures, implementing long-term preventive measures. Automation and Efficiency : Automate repetitive tasks and optimize system performance to improve operational efficiency. On-Call Support : Participate in covering weekday business hours and once-monthly weekend shifts. Collaboration and Customer Engagement Cross-Functional Teamwork : Work closely with software engineering, infrastructure

Requirements

and product teams to integrate reliability practices into every stage of the development lifecycle. Reliability Advocacy : Champion SRE best practices and foster a culture of operational excellence across the organization. Global Team Collaboration : Collaborate with a distributed team of engineers worldwide to provide round-the-clock support. Customer Support : Interface with customers to address and resolve reported incidents, ensuring a seamless user experience. Qualifications and Skills SRE Expertise : Proven experience as a Site Reliability Engineer or similar role, with a history of supporting complex distributed systems (minimum five years supporting complex distributed systems). Observability Tools : Experience with monitoring and debugging tools like Prometheus, Vector, Grafana, Superset, or Kibana. Cloud Platforms : Proficiency in at least one major cloud platform (AWS, GCP, Azure, or Linode). Database Knowledge : Experience with SQL databases; familiarity with Postgre SQL is a plus but not required. Programming Skills : Proficiency in programming languages such as Python, Go, or Rust. Linux Expertise : Strong experience with Linux systems, including performance tuning and system-level troubleshooting. Communication Skills : Excellent written and verbal communication skills, with the ability to convey technical concepts clearly to diverse audiences, including customers and cross-functional teams. #J-18808-Ljbffr ", "employmentType": "FULL_TIME", "industry": "Site Reliability", "jobLocation" : { "@type": "Place", "address": { "@type": "PostalAddress", "streetAddress": "n/a", "addressLocality": "Spain", "addressRegion": "Spain", "addressCountry": "ES", "postalCode": "n/a" } }, "salaryCurrency": "EUR", "title": "Senior site reliability engineer", "hiringOrganization" : { "@type" : "Organization", "name" : "Hydrolix" } }

About the company

{ "@context": "http://schema.org", "@type": "JobPosting", "baseSalary" : { "@type": "MonetaryAmount", "currency": "EUR", "value": { "@type": "QuantitativeValue", "value": 0.00, "unitText": "MONTH" } }, "datePosted": "2026-03-05", "validThrough" : "2026-07-03", "description": " At Hydrolix, we are revolutionizing the world of data management and analytics with our innovative cloud data platform, purpose-built for petabyte-scale datasets. Our mission is to help organizations drastically reduce data costs while increasing their data retention. We are looking for a Site Reliability Engineer (SRE) to join our dynamic Services team. In this role, you will contribute to the reliability and scalability of our cutting-edge platform, ensuring exceptional solutions tailored to our customers' unique needs. This is a highly technical, hands-on role that requires deep expertise in system reliability and automation. Key Responsibilities Infrastructure Reliability : Deploy, maintain, and ensure

Role details

Job location

Tech stack

Job description

Requirements

About the company

Apply for this position

Good distractions

Moments

Videos View all