Data Engineer - ML Training Infrastructure
SpAItial
1 month ago
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
Intermediate Compensation
£ 65KJob location
Tech stack
Artificial Intelligence
Amazon Web Services (AWS)
Apache HTTP Server
Computer Vision
Azure
Software Quality
Continuous Integration
Information Engineering
Data Infrastructure
Cursor (Graphical User Interface Elements)
Programming Tools
Distributed Data Store
Python
TensorFlow
Parquet
Data Processing
PyTorch
Spark
Job description
- Architect and manage data infrastructure for large-scale ML training datasets (e.g., Apache, Iceberg, Parquet, Spark).
- Build and operate ingestion pipelines for multimodal data (e.g., images, videos, 3D), including metadata generation and quality signals.
- Design data loaders, caching, and serving strategies optimized for ML training.
- Develop tools for dataset inspection, experiment tracking, and evaluation workflows.
- Partner closely with ML researchers to ensure infrastructure scales with training demands.
- Uphold code quality and best practices in testing, CI/CD, and reproducibility.
Requirements
- 3+ years professional software/data engineering experience with production systems.
- Proven experience in large-scale data processing for ML training (not just analytics/BI).
- Hands-on with distributed data frameworks (e.g., Spark, Beam, Cloud SQL) and modern data formats (Parquet, Iceberg).
- Proficiency in cloud platforms (AWS, GCP, or Azure).
- Strong Python development skills, including testing and code quality.
- Experience building and maintaining CI/CD pipelines.
Preferred Qualifications
- Familiarity with ML frameworks (e.g., PyTorch, TensorFlow).
- Experience preparing multimodal datasets (images, video, 3D) for ML pipelines.
- Background in computer vision or 3D reconstruction (e.g., Structure-from-Motion).
- Interest in AI-assisted developer tools (Cursor, Windsurf, etc.).
About the company
SpAItial is pioneering the development of a frontier 3D foundation model, pushing the boundaries of AI, computer vision, and spatial computing. Our mission is to redefine how industries, from robotics and AR/VR to gaming and movies, generate and interact with 3D content., At SpAItial, we are committed to creating a diverse and inclusive workplace. We welcome applications from people of all backgrounds, experiences, and perspectives. We are an equal opportunity employer and ensure all candidates are treated fairly throughout the recruitment process.