Senior DevOps Engineer - Highload, Cloud & Data-Intensive Systems (EU / Remote)

Alex Staff Agency

Barcelona, Spain

2 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Compensation

€ 8K

Job location

Remote

Barcelona, Spain

Tech stack

PHP

API

Amazon Web Services (AWS)

Data analysis

Bash

Big Data

Ubuntu (Operating System)

Cloud Computing

Databases

Continuous Integration

Linux

DevOps

Virtual Private Networks (VPN)

Python

PostgreSQL

Lua

Logical Volume Manager

MongoDB

Node.js

OpenVPN

Redis

Ansible

Blockchain

Prometheus

Ruby

Grafana

Kubernetes

Rancher

Sentry

Bare Metal

Cloudflare

Terraform

Ddos

Docker

Jenkins

Job description

The team develops and maintains distributed services around analytics, APIs, and transaction monitoring. The systems process very large volumes of data - terabytes of storage, trillions of records, continuously growing load.

Infrastructure:

~100 servers (bare metal + VPS) active use of IaC Kubernetes clusters in production focus on stability, observability, and automation

The project is long-term - not a hype startup, but a mature product with real users.

What the work looks like This is a hands-on role with a clear time allocation:

60% - operations and incidents (including helping teams) 20% - infrastructure automation 20% - prototyping, improvements, technical initiatives

There is on-call responsibility, but normally after-hours incidents happen 2-3 times a year, not every week.

Responsibilities Operation of production services and infrastructure (server provisioning/decommissioning, updates, replacements, performance troubleshooting) Support and development of Infrastructure as Code (Terraform / Ansible: modules, roles, standards, reviews) Monitoring, alerting, backups, and regular recovery checks Development of service and infrastructure automation Development of CI/CD and release procedures Incident diagnosis and resolution, support for product teams Traffic analytics, bot and attack protection tools Responsibility for 24/7 platform stability

Requirements

What's important 4+ years of experience operating Linux/Ubuntu infrastructure and production services Strong understanding of networking and troubleshooting Kubernetes (cluster operations), Rancher, Docker / containerd Hands-on experience with Ansible and Terraform Monitoring: Prometheus / Thanos / Telegraf / Grafana / Sentry CI/CD: Jenkins Automation: Bash, Python Experience working with LVM

Nice to have Experience working with blockchain nodes Diagnosis and tuning of ClickHouse and MongoDB in high-load clusters Providers: Hetzner / OVHcloud Cloudflare (edge, DDoS), experience with AWS Handling abuse tickets with hosting providers

Technology stack VPN: WireGuard, OpenVPN Databases: ClickHouse, MongoDB, Redis, PostgreSQL Applications: Node.js (pm2), php-fpm, Lua, Tarantool Supporting services: Go (operatorSDK), Ruby, Node.js, PHP