Senior DevOps Engineer - Highload, Cloud & Data-Intensive Systems (EU / Remote)

Alex Staff Agency
Barcelona, Spain
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
€ 8K

Job location

Remote
Barcelona, Spain

Tech stack

PHP
API
Amazon Web Services (AWS)
Data analysis
Bash
Big Data
Ubuntu (Operating System)
Cloud Computing
Databases
Continuous Integration
Linux
DevOps
Virtual Private Networks (VPN)
Python
PostgreSQL
Lua
Logical Volume Manager
MongoDB
Node.js
OpenVPN
Redis
Ansible
Blockchain
Prometheus
Ruby
Grafana
Kubernetes
Rancher
Sentry
Bare Metal
Cloudflare
Terraform
Ddos
Docker
Jenkins

Job description

The team develops and maintains distributed services around analytics, APIs, and transaction monitoring. The systems process very large volumes of data - terabytes of storage, trillions of records, continuously growing load.

Infrastructure:

~100 servers (bare metal + VPS) active use of IaC Kubernetes clusters in production focus on stability, observability, and automation

The project is long-term - not a hype startup, but a mature product with real users.

What the work looks like This is a hands-on role with a clear time allocation:

60% - operations and incidents (including helping teams) 20% - infrastructure automation 20% - prototyping, improvements, technical initiatives

There is on-call responsibility, but normally after-hours incidents happen 2-3 times a year, not every week.

Responsibilities Operation of production services and infrastructure (server provisioning/decommissioning, updates, replacements, performance troubleshooting) Support and development of Infrastructure as Code (Terraform / Ansible: modules, roles, standards, reviews) Monitoring, alerting, backups, and regular recovery checks Development of service and infrastructure automation Development of CI/CD and release procedures Incident diagnosis and resolution, support for product teams Traffic analytics, bot and attack protection tools Responsibility for 24/7 platform stability

Requirements

What's important 4+ years of experience operating Linux/Ubuntu infrastructure and production services Strong understanding of networking and troubleshooting Kubernetes (cluster operations), Rancher, Docker / containerd Hands-on experience with Ansible and Terraform Monitoring: Prometheus / Thanos / Telegraf / Grafana / Sentry CI/CD: Jenkins Automation: Bash, Python Experience working with LVM

Nice to have Experience working with blockchain nodes Diagnosis and tuning of ClickHouse and MongoDB in high-load clusters Providers: Hetzner / OVHcloud Cloudflare (edge, DDoS), experience with AWS Handling abuse tickets with hosting providers

Technology stack VPN: WireGuard, OpenVPN Databases: ClickHouse, MongoDB, Redis, PostgreSQL Applications: Node.js (pm2), php-fpm, Lua, Tarantool Supporting services: Go (operatorSDK), Ruby, Node.js, PHP

Apply for this position