Senior Site Reliability Engineer

ICEO - Venture Builder

2 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Compensation

€ 90K

Job location

Remote

Tech stack

Secure Shell (SSH)

Java

Amazon Web Services (AWS)

Apache HTTP Server

Confluence

JIRA

Azure

Bash

C++

Ubuntu (Operating System)

Continuous Integration

Debian Linux

Linux

DevOps

Document Management Systems

DNS

Elasticsearch

Fault Tolerance

HTTP Secure

Java Virtual Machine (JVM)

Python

PostgreSQL

Linux System Administration

Nginx

Node.js

OpenVPN

Redis

Reliability Engineering

Prometheus

Systems Architecture

TCP/IP

Tripwire

Wide Area Networks

Data Logging

Load Balancing

Okta

Fluentd

Istio

Grafana

Reliability of Systems

Firewalls (Computer Science)

Containerization

Kubernetes

Kafka

Bitbucket

Kibana

REST

Terraform

Software Version Control

Google Meet

Docker

Job description

Own and lead the definition and execution of the SRE vision and strategy, ensuring alignment with business objectives and engineering priorities.
Architect, maintain and develop infrastructure within GCP and GKE, focusing on performance, security, availability and reliability.
Develop automated solutions for system reliability, capacity planning and incident response to minimize manual intervention.
Collaborate with engineering and product teams to design and implement highly available, fault-tolerant systems.
Own and deliver Service Level Objectives, Service Level Indicators and error budgets to enhance system reliability.
Create and maintain documentation for implemented solutions.
Mentor engineering teams on SRE principles, DevOps culture and best practices.
Stay updated on industry trends, evaluating new tools and methodologies to improve system reliability.
Balance security, performance and flexibility in all decisions.
Participate in daily stand-ups, planning and other team meetings., * Communication: Slack, Google Meet
Work management: Jira
Documentation: Confluence
Repository: Bitbucket
Automation & IaC: Bash, Python, Go, Terraform
Observability: Prometheus, Grafana, Jaeger, Tempo, Loki
CI/CD: Bitbucket Pipelines, ArgoCD
Containerization & orchestration: Docker, Kubernetes, Helm
Security tooling: SOPS, Okta, TFsec, Trivy, Istio
Stateful: PostgreSQL, TimescaleDB, Redis, Kafka, Elasticsearch
HTTP: Nginx & Ingress-Nginx

Recruitment process

Stage 1 - Screening with Talent Acquisition Partner ( 45 min).
Stage 2 - Technical interview with Senior Developer ( 1 h, system architecture focus).
Stage 3 - Interview with Lead of DevOps - hands-on Docker/Kubernetes/Linux scenarios ( 1 h).
Stage 4 - Final interview with Head of Technology (30 min).
Background check after offer extension.

Requirements

5+ years in a DevOps, SRE or similar role, working on a product with long-term platform maintenance.
10+ years of experience in technology.
Independent platform management experience with autonomous decision making.
Proficiency in at least one programming language (Python, Go, C++ or Java).
Extensive experience with JVM, Node.js and related application maintenance.
Advanced Linux administration (Debian/Ubuntu).
Strong networking knowledge (LAN/WAN, firewall, proxy, load balancers, HTTP(S), DNS, SSH, TCP/IP, REST).
Hands-on experience with observability tools (Prometheus, Grafana, OpenTelemetry, etc.).
Knowledge of Kafka, Redis, Nginx and Docker.
Experience with CI/CD and version control systems.
Expertise in Kubernetes, Helm and Helm charts.
Public cloud experience (GCP, AWS or Azure) including redundancy and disaster-recovery design.
Design, implementation and maintenance of scalable, high-performance infrastructure (HPA, KEDA, affinity rules).
Proficient in written and spoken English (B2 or higher).