Data Engineer
Role details
Job location
Tech stack
Job description
Data Pipelines: Build and scale pipelines for complex training workflows.
Data Integrity: Ensure high-quality, consistent data across all projects.
Collaboration: Partner with ML researchers on versioning and data security.
Metrics: Develop reporting systems for data quality and performance.
Requirements
Core: Python, SQL, and data processing frameworks.
Experience: Large-scale data management (100s of TBs to Petabytes).
Education: Degree in CS, Data Science, or a related field.
Bonus: Background in robotics, CV, or autonomous systems.
Technical Self-Assessment (1-10)
Candidates will be asked to rate their knowledge of:
Languages: Python, Java, Scala
Compute/Distributed: RAY, Spark, Databricks, Hadoop
Orchestration/Cloud: Kafka, Airflow, Prefect, AWS
Storage/Warehouse: Clickhouse, Snowflake, Redshift, Greenplum
Note: Experience with Petabyte/Exabyte scale is a strong signal for this role.