Platform Team Lead - Remote

Zyte
Barcelona, Spain
8 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Intermediate

Job location

Remote
Barcelona, Spain

Tech stack

Java
API
Artificial Intelligence
Airflow
C++
Cloud Computing
Continuous Integration
Linux
Distributed Systems
Python
Machine Learning
Open Source Technology
Mesos
Data Streaming
System Programming
Data Logging
Graphics Processing Unit (GPU)
Concurrency
Containerization
Kubernetes
Kafka
Machine Learning Operations

Job description

Overview At Zyte, we eat data for breakfast and you can eat your breakfast anywhere and work for Zyte. Founded in 2010, we are a globally distributed team of over 250 Zytans working from over 28 countries who are on a mission to enable our customers to extract the data they need to continue to innovate and grow their businesses. We believe that all businesses deserve a smooth pathway to data. Zyte is seeking an experienced Team Lead to manage our Core & MLOps Squad, responsible for building the bedrock infrastructure that powers Zyte at scale. This hands-on technical leadership role requires expertise across MLOps, systems programming, and orchestration to lead a cross-functional team in designing and maintaining the scalable foundation that enables all Zyte teams to build and run their services with confidence. For more than a decade, Zyte has led the way in building powerful, easy-to-use tools to collect, format, and deliver web data, quickly, dependably, and at scale. Today, the, and training platform with standardized experiment/evaluation harnesses. * Provide turnkey serving patterns (online + batch), drift/quality monitoring, and rollback playbooks. * Integrate public/open-source AI capabilities as managed platform services with cost and data-governance guardrails. * Run the squad: roadmap/prioritization, delivery, mentoring, and high engineering standards. * Partner with product engineering (Zyte API, Scrapy Cloud), Prod Ops, and Security on adoption and rollout plans. * Mentor the team and foster a platform-thinking mindset. * Own container orchestration (Kubernetes/Knative), GPU provisioning & autoscaling, environment & secret management. * Develop operators, sidecars, and internal SDKs/libraries (Go/Rust/Python/Java) that enforce the golden path contract. * Maintain model platform responsibilities: registry, experiment tracking, training orchestration, evaluation framework, serving infra, model monitoring. * Establish observability

Requirements

data we extract helps thousands of organizations make smarter business decisions, secure competitive advantage, and drive sustainable growth. Today, over 3,000 companies and 1 million developers rely on our tools and services to get the data they need from the web. Location: Barcelona, Catalonia, Spain. Remote-friendly company with multiple remote options. Referrals increase your chances of interviewing at Zyte by 2x. Get notified about new Team Lead jobs in Barcelona, Catalonia, Spain. Responsibilities * Design and evolve the core platform (Kubernetes, Mesos, GPU scheduling/autoscaling, distributed compute). * Own the model platform: registry, experiment tracking, training orchestration, evaluation, serving, and monitoring. * Build the Golden Path: reference repos, a scaffold CLI, opinionated CI/CD pipelines, runtime contracts (health/metrics/tracing/SLOs), high-performance clients, circuit breakers and other production-ready defaults. * Operate a secure, multi-tenant model registry, pipelines: logging/metrics/tracing. * Manage billing pipeline: metering, events, and cost tracking abstractions. * Develop the Golden Path: Java, Python, ML templates, CI/CD blueprints, docs, and scaffold CLI. * Improve reliability and security: SRE practices, cost governance, and supply-chain security (SBOM, image signing). Qualifications Required * 5+ years experience building distributed systems; 3+ years in MLOps/ML platform engineering (or equivalent impact). * Knowledge of Linux/OS internals, networking, concurrency, and performance profiling. * Deep understanding of Kubernetes (bonus: Mesos). * Proficiency developing high-performance services in Java, Rust, Go or C++ with strong Python skills (bonus: vert.x and Netty). * Experience with GPU infrastructure (scheduling, containerization, optimization). * Track record of designing and operating model platforms in production. * Demonstrated success leading technical teams and implementing organization-wide platform solutions. Preferred * Streaming & workflows: Kafka plus Argo/Temporal/Airflow or equivalents. * eBPF-based observability, perf tooling, or io_uring experience. * Cost optimization for ML/AI; multi-tenant quotas and fairness. * Hands-on experience authoring Golden Paths (service chassis/templates, CI/CD blueprints, CLI scaffolds). * SRE practices (SLIs/SLOs, incident management). Benefits * We love fostering and nourishing new ideas and bringing them to market. * Become part of a self-motivated, progressive, multi-cultural team. * Have the freedom and flexibility to work from where you do your best work, as we are a completely remote company. * Get the chance to work with cutting-edge open-source technologies and tools. #J-18808-Ljbffr

Apply for this position