Data Engineer
Role details
Job location
Tech stack
Job description
As a Data Engineer at MANUS, you will work together with a bright, multidisciplinary team of engineers to build a robust data foundation for the future of robotics and AI. Our goal is to facilitate the collection of high-quality, multimodal datasets to train embodied AI: AI that interacts with and understands the physical world.
In this role, you are the subject matter expert on the team regarding data orchestration and infrastructure. You will work closely with our Embodied AI Engineers to determine what data needs to be collected: ranging from video and audio to various sensor streams: and how it should be structured to be "AI-ready." Your focus will be on the end-to-end data lifecycle: from orchestrating local collection at the source to managing the efficient transition and storage of that data in the cloud. You have full responsibility for your own projects and are expected to work independently.
Your Responsibilities
· Multimodal Data Orchestration: Acting as the team's authority on capturing diverse data types (video, sensors, etc.) and ensuring they are perfectly structured for machine learning training loops.
· Pipeline Management: Designing and maintaining the pipelines that manage local data collection and the subsequent upload and ingestion into our cloud ecosystem.
· Cloud Architecture: Building and maintaining scalable cloud environments (AWS, Google Cloud, or Azure) optimized for storing and processing massive, multi-modal datasets.
· Synchronization & Alignment: Solving the complex challenge of time-syncing video frames with high-frequency sensor data to ensure "frame-perfect" training sets.
· Collaboration: Working side-by-side with AI engineers to align data collection strategies with model requirements.
Requirements
Do you have experience in Terraform?, · Strong background in Computer Science, Data Engineering, or a related field.
· 5+ years of experience in a data-focused or backend role.
· Advanced proficiency in Python and SQL.
· Experience handling multimodal data: such as video, audio, or time-series sensor data.
· Deep expertise in architecting cloud-based data storage and processing solutions.
· Excellent communication skills: you can bridge the gap between technical data requirements and high-level AI goals.
- Excellent proficiency in both spoken and written English.
A Plus Would Be
· Experience with Robotics (ROS/ROS2) or AI training pipelines.
· Knowledge of local data capture systems and edge-to-cloud synchronization.
· Familiarity with containerization (Docker, Kubernetes) and Terraform.
· Experience with Git, Jira, and Confluence.