Senior DevOps Engineer - Highload, Cloud & Data-Intensive Systems (EU / Remote)
Role details
Job location
Tech stack
Job description
The team develops and maintains distributed services around analytics, APIs, and transaction monitoring. The systems process very large volumes of data - terabytes of storage, trillions of records, continuously growing load.
Infrastructure:
~100 servers (bare metal + VPS) active use of IaC Kubernetes clusters in production focus on stability, observability, and automation
The project is long-term - not a hype startup, but a mature product with real users.
What the work looks like This is a hands-on role with a clear time allocation:
60% - operations and incidents (including helping teams) 20% - infrastructure automation 20% - prototyping, improvements, technical initiatives
There is on-call responsibility, but normally after-hours incidents happen 2-3 times a year, not every week.
Responsibilities Operation of production services and infrastructure (server provisioning/decommissioning, updates, replacements, performance troubleshooting) Support and development of Infrastructure as Code (Terraform / Ansible: modules, roles, standards, reviews) Monitoring, alerting, backups, and regular recovery checks Development of service and infrastructure automation Development of CI/CD and release procedures Incident diagnosis and resolution, support for product teams Traffic analytics, bot and attack protection tools Responsibility for 24/7 platform stability
Requirements
What's important 4+ years of experience operating Linux/Ubuntu infrastructure and production services Strong understanding of networking and troubleshooting Kubernetes (cluster operations), Rancher, Docker / containerd Hands-on experience with Ansible and Terraform Monitoring: Prometheus / Thanos / Telegraf / Grafana / Sentry CI/CD: Jenkins Automation: Bash, Python Experience working with LVM
Nice to have Experience working with blockchain nodes Diagnosis and tuning of ClickHouse and MongoDB in high-load clusters Providers: Hetzner / OVHcloud Cloudflare (edge, DDoS), experience with AWS Handling abuse tickets with hosting providers
Technology stack VPN: WireGuard, OpenVPN Databases: ClickHouse, MongoDB, Redis, PostgreSQL Applications: Node.js (pm2), php-fpm, Lua, Tarantool Supporting services: Go (operatorSDK), Ruby, Node.js, PHP