Senior Machine Learning Engineer, Data for Embodied AI

Wayve

1 month ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Compensation

£ 94K

Job location

Tech stack

Artificial Intelligence

Amazon Web Services (AWS)

Data analysis

Azure

Cloud Computing

Information Engineering

Data Governance

Data Systems

Software Debugging

Distributed Data Store

Python

Machine Learning

Management of Software Versions

Google Cloud Platform

Data Ingestion

PyTorch

Spark

Data Lake

Dask

Machine Learning Operations

Lidar

Data Pipelines

Job description

Design and implement large-scale data acquisition, processing, and curation pipelines, owning the full lifecycle of high-quality datasets used to train advanced robotics and foundation models.
Continuously improve dataset quality and utility through sophisticated data analysis, debugging, and experimentation; developing metrics, tests, and monitoring mechanisms that directly drive model performance improvements.
Develop and scale multimodal data pipelines for ingestion, preprocessing, filtering, annotation, and storage across video, LiDAR, and telemetry modalities.
Run systematic experiments on data ablations and composition to assess their impact on model training dynamics, generalisation, and downstream performance.
Collaborate with ML researchers and platform engineers to ensure datasets are fit for purpose and efficiently integrated into large-scale training workflows.
Build internal tools and workflows for dataset auditing, visualization, and versioning to streamline iteration and reproducibility.
Advance best practices for data governance, reliability, and scalability across the data lifecycle; ensuring data safety, privacy, and long-term maintainability., * Shape the future of embodied AI through data. Your work will directly determine the quality, scale, and impact of the foundation models that drive our autonomy systems.
Tackle data challenges at unprecedented scale. Work with petabytes of multimodal data - video, lidar, and telemetry - and build pipelines that enable training at the frontier of AI.
Collaborate with world-class talent. Partner with leading ML researchers, software engineers, and data scientists who are redefining how AI learns from real-world experience.
Make your mark on real-world autonomy. Your data systems will power models that see, understand, and act in the world.
Work in a high-trust, high-autonomy environment. We value creativity, experimentation, and rigorous thinking. You'll have the freedom to explore bold ideas and the support to make them real.

We understand that everyone has a unique set of skills and experiences and that not everyone will meet all of the requirements listed above. If you're passionate about self-driving cars and think you have what it takes to make a positive impact on the world, we encourage you to apply.

Requirements

To set you up for success as a Senior MLE at Wayve, we're looking for the following skills and experience:

5+ years of experience in ML engineering, data engineering, or applied ML roles focused on large-scale data systems.
Proven experience building and maintaining large-scale data pipelines for machine learning, including data ingestion, transformation, and validation.
Strong Python fundamentals and experience with modern ML and data frameworks (e.g. PyTorch, Ray, Dask, Spark, or equivalent).
Solid understanding of multimodal data (video, lidar, sensor telemetry) and its challenges in large-scale training.
Experience defining and tracking data quality metrics, conducting dataset analysis, and driving data-informed improvements in model performance.
Demonstrated ability to work collaboratively with ML researchers, platform engineers, and product teams in a fast-paced, experimental environment.
Strong problem-solving skills, a data-driven mindset, and the ability to translate research needs into reliable data solutions.

Desirable

Exposure to large-scale storage, distributed training systems, or cloud compute environments (Azure, AWS, GCP).
Experience designing high-throughput, distributed data pipelines (e.g. with Spark, Ray, Beam, or similar frameworks).
Familiarity with data versioning, lineage, and governance tools (e.g. LakeFS, DVC, MLflow, Delta Lake).
Experience in AVs, robotics, simulation, or other embodied AI domains.
Familiarity with foundation models, generative models, or simulation-based data pipelines.

About the company

At Wayve we're committed to creating a diverse, fair and respectful culture that is inclusive of everyone based on their unique skills and perspectives, and regardless of sex, race, religion or belief, ethnic or national origin, disability, age, citizenship, marital, domestic or civil partnership status, sexual orientation, gender identity, veteran status, pregnancy or related condition (including breastfeeding) or any other basis as protected by applicable law. Founded in 2017, Wayve is the leading developer of Embodied AI technology. Our advanced AI software and foundation models enable vehicles to perceive, understand, and navigate any complex environment, enhancing the usability and safety of automated driving systems. Our vision is to create autonomy that propels the world forward. Our intelligent, mapless, and hardware-agnostic AI products are designed for automakers, accelerating the transition from assisted to automated driving. In our fast-paced environment big problems ignite us-we embrace uncertainty, leaning into complex challenges to unlock groundbreaking solutions. We aim high and stay humble in our pursuit of excellence, constantly learning and evolving as we pave the way for a smarter, safer future. At Wayve, your contributions matter. We value diversity, embrace new perspectives, and foster an inclusive work environment; we back each other to deliver impact. Make Wayve the experience that defines your career! Science is the team that is advancing our end-to-end autonomous driving research. The team's mission is to accelerate our journey to AV2.0 and ensure the future success of Wayve by incubating and investing in new ideas that have the potential to become game-changing technological advances for the company. The goal of this role is to build, scale, and optimise next-generation world model architectures (e.g. GAIA and successors) and bridge them into high-throughput training infrastructure, enabling synthetic data and simulation to dramatically accelerate autonomy development. You'll design systems to acquire, process, and curate multimodal data at scale. You'll turn raw experience into the high-quality datasets that fuel our models. You'll sit at the intersection of machine learning research and data engineering, collaborating closely with scientists and infrastructure teams to ensure our workflows are robust, efficient, and deeply integrated with our model training stack. Your work will directly impact how quickly and effectively we can train, evaluate, and deploy embodied AI systems in the real world.