Senior Infrastructure Engineer
Role details
Job location
Tech stack
Job description
At Talos, our Infrastructure Engineers are responsible for the hardware and software systems that underpin the secure, efficient, and reliable operation of our products. We run a bare-metal, global infrastructure footprint with datacenters in Paris, NYC, and Dallas, and partner closely with engineering teams to keep our environments performant, scalable, and resilient.
This is a hands-on role covering datacenter operations, Linux systems, Kubernetes clusters, networking, and CI/CD pipelines. You'll have the opportunity to work across the full stack, from low-level infrastructure to developer tooling, and play a critical role in ensuring our products and engineering teams run at scale.
You'll join a small, highly skilled team with significant responsibility and autonomy, working in an environment where infrastructure excellence directly impacts our clients and product teams.
Responsibilities
- Infrastructure operations: Build, maintain, and troubleshoot global bare-metal infrastructure (50+ servers per datacenter).
- Datacenter support: Collaborate with vendors to manage system stability, tune performance, troubleshoot problems and upgrade capacity.
- Kubernetes administration: Manage cluster upgrades, patching, testing, and security hardening.
- Monitoring & incident response: Operate and improve monitoring/alerting with Prometheus, Grafana, and Loki; provide rapid incident response, root-cause analysis, and remediation.
- CI/CD support: Manage GitLab pipelines, build runners, and related services; step in during incidents impacting developer productivity.
- Collaboration: Partner with engineering teams on capacity planning, scaling projects, and infrastructure tooling.
- Automation: Use infrastructure as code (Ansible, Terraform, bash, Python, Docker) to standardize and automate system management.
- Continuous improvement: Identify gaps and opportunities to improve reliability, efficiency, and tooling.
- Coverage: Provide infrastructure support during EU hours, contributing to 24/7 team capacity., Agencies, search firms, recruitment firms and similar organizations ("Agencies") must obtain advance written approval from Talos's internal recruiting team to submit resumes, AND must sign a valid fully executed placement agreement with Talos in order to be eligible to receive any Fees from Talos. Talos will not pay a Fee to any Agency that does not have such agreement in place. By submitting a resume without a signed agreement, you acknowledge and accept these terms.
Requirements
- Hands-on with bare-metal datacenter design, planning, implementation, and support.
- Strong background in Ubuntu Linux administration, with automation using IPMI, Ansible, and Terraform.
- Solid understanding of IPv4/IPv6 networking, VPNs (Wireguard), DNS, and reverse proxies/load balancers (nginx, tinyproxy, APISIX), plus Cloudflare.
- Skilled in Kubernetes cluster administration, including tools such as Cilium, Helm, ArgoCD, and Kyverno.
- Experience with at least one of: Postgres HA/scale (80+ DBs per DC, largest >100TB), Kafka, Minio, or blockchain fullnodes.
- Familiarity with Prometheus, Loki, and Grafana for monitoring and alerting.
- Exposure to cloud computing environments (AWS, GCP).
- Practical experience designing and supporting CI/CD pipelines and build environments.
- Ability to configure and troubleshoot GitLab CI/CD pipelines and build runners.
- Proficiency in scripting and automation using bash, Python, and Docker.
Benefits & conditions
You will also enjoy a comprehensive array of competitive benefits, regardless of your location, within our warm, welcoming, and ambitious company culture. Our offerings include a monthly wellness credit for personal use, such as gym memberships, massages, or even a ski pass for your next holiday. Additionally, we provide paid lunches in the office, monthly fitness and evening socials to foster connections with colleagues, and annual offsite events to engage with the wider team.