Platform Engineer

Avance Consulting
Municipality of Madrid, Spain
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Municipality of Madrid, Spain

Tech stack

JavaScript
Airflow
Amazon Web Services (AWS)
Amazon Web Services (AWS)
JIRA
Configuration Management
Continuous Integration
Data Governance
Memory Management
Github
Identity and Access Management
Python
Key Management
NoSQL
OpenCV
Role-Based Access Control
Prometheus
Runbook
Shell Script
Software Construction
Systems Integration
TypeScript
Policy as Code
Data Logging
Graphics Processing Unit (GPU)
Data Server Interface
Autoscaling
Grafana
Concurrency
Amazon Web Services (AWS)
Gitlab-ci
Git Flow
Kubernetes
Infrastructure Automation Frameworks
Information Technology
Machine Learning Operations
Cloudwatch
Zendesk
Terraform
Splunk
New Relic (SaaS)
Software Version Control
GXP
Jenkins

Job description

with Terraform; implement repeatable environment provisioning, configuration management, and golden paths for teams. * Establish CI/CD workflows (GitHub Actions/Jenkins/GitLab CI), build/test standards, and progressive delivery patterns that keep releases fast and low-risk. * Implement logging, metrics, and tracing (e.g., Prometheus, Grafana, CloudWatch, Splunk/New Relic) with actionable SLOs, alerts, and runbooks; embed security and compliance by design. * Collaborate closely with product and science teams to remove bottlenecks, eliminate manual steps, and evolve service and data interfaces that make operating image pipelines simple and reliable. * Contribute to future-state architectures that improve scalability, resiliency, and operational efficiency; lead targeted refactors and platform improvements. * Manage core automation and tooling, and educate teams on platform capabilities, CI/CD, configuration management, and infrastructure automation best practices. Required (Must-have)

Requirements

  • M.Sc. in Computer Science/Engineering (or equivalent) or comparable industry experience. * Practical, production experience operating Kubeflow Pipelines for reproducible ML workflows at scale. * Proven experience deploying and operating workloads on Kubernetes (EKS/GKE/AKS), including upgrades, autoscaling, RBAC, networking, and reliability; strong Unix/Linux fundamentals. * Hands-on experience with AWS services (EKS, EC2, S3, IAM, CloudWatch; RDS a plus) and the ability to design secure, cost-aware architectures. * Strong Terraform skills and Git-based workflows for repeatable infrastructure provisioning and configuration management. * Practical experience with CI/CD platforms (GitHub Actions/Jenkins/GitLab CI), including artifact management, environment promotion, and progressive delivery. * Solid Python and/or shell scripting for platform automation and toil reduction. * Experience implementing logging, metrics, and tracing with SLOs, alerts, and runbooks (e.g., Prometheus, Grafana, CloudWatch, Splunk/New Relic) and a security-first mindset. * Ability to lead technical initiatives, communicate trade-offs clearly, and collaborate effectively with engineering and science teams Desirabel (Nice to have): * Experience with MLflow, Feast, Argo, Airflow, Ray, and model versioning/monitoring. * Familiarity with S3/object storage, artifact registries, and handling large image datasets; basic SQL/NoSQL exposure. * Experience with digital pathology or large-scale image processing (e.g., whole-slide images) and tools like OpenSlide, scikit-image, or OpenCV. * Experience tuning high-throughput pipelines, concurrency, memory usage, and integrating GPUs/accelerators. * Experience with VPC design, ingress/egress, service meshes, secrets management, IAM, and policy as code. * Experience in regulated environments (e.g., GxP), including data governance, privacy, and building software under regulated processes. * Experience with Jira/Zendesk and with JavaScript/TypeScript for internal tools or dashboards.

Apply for this position