Platform Engineer (Observability), Remote Spain
Role details
Job location
Tech stack
Job description
- Implement the solutions to our product requirements and technology challenges.
- Ensure technological performance by design, right-sizing, scalability and reliability.
- Ensure the Definition of Done is followed at technical level.
- Analyze technical trade-offs.
- Determine primary components and subsystems.
- Detect weaknesses and failure points within the Observability solutions and propose improvements.
- Support teams during significant technological challenges.
- Lead the cross definition for technical and functional monitoring.
- Collaborate and be driven keeping technical documentation up to date.
- Collaborate in the company' s Communities of Practice.
- Advocate of new technological features (Observability related).
- Technological reference and evangelist.
Requirements
We are looking for a skilled Observability Engineer with experience working with Infrastructure as Code using Terraform and Helm, with deep knowledge of Observability features (logging, metrics, tracing, APM, …) ideally in hybrid high-volume distributed platforms with thousands of transactions per second.
We are looking for someone with proven experience working with de-facto market-standard open-source components including Telegraf, Prometheus, Grafana, FluentBit, Logstash,
OpenSearch/ElasticSearch, Kibana, OpenTelemetry, Jaegger, or similar.
Our Observability Platform components are deployed on Kubernetes and/or AWS EC2 instances, which require support and rollout for all of the teams / internal customers that use it. In parallel we will require governance, good practices and ad-hoc features automations.
The successful candidate will form part of a talented and motivated team of IT professionals focusing on implementing the solutions for our product requirements and technology challenges on an ongoing basis., * Experienced as Observability Engineer or similar position defining, implementing, and supporting teams in monitoring, logging, tracing and APM challenges.
- Experience with Observability solutions such as Telegraf, Prometheus, Grafana, FluentBit, Logstash, OpenSearch/ElasticSearch, Kibana, OpenTelemetry, Jaegger.
- Experience working in cloud environments such as AWS.
- Experience with Infrastructure as Code frameworks such as Terraform and Helm.
- Experience working with container-native applications deployed in Docker & Kubernetes.
- Experience in Linux systems administration.
- Experience working in a Distributed System environment.
- Experience working with Agile Methodologies such as Scrum or Safe.
- High written and verbal communication skills in both English and Spanish.
- Experience with Python or Go is a plus.
- Strong technical, logical, analytical, and problem-solving skills.