AI Data Engineer
Role details
Job location
Tech stack
Job description
As a Senior Data Engineer for AI/ML, you will be the architect and builder of the data infrastructure that feeds our intelligent systems. Your responsibilities will include:
- Design and Build Scalable Data Pipelines: Architect, implement, and optimize robust, high-performance real-time and batch ETL pipelines to ingest, process, and transform massive datasets for LLMs and foundational AI models.
- Cloud-Native Innovation: Leverage your deep expertise across AWS, Azure, and/or GCP to build cloud-native data solutions, ensuring efficiency, scalability, and cost-effectiveness.
- Power Generative AI: Develop and manage specialized data flows for generative AI applications, including integrating with vector databases and constructing sophisticated RAG pipelines.
- Champion Data Governance & Ethical AI: Implement best practices for data quality, lineage, privacy, and security, ensuring our AI systems are developed and used responsibly and ethically.
- Tooling the Future: Get hands-on with cutting-edge technologies like Hugging Face, PyTorch, TensorFlow, Apache Spark, Apache Airflow, and other modern data and ML frameworks.
- Collaborate and Lead: Partner closely with ML Engineers, Data Scientists, and Researchers to understand their data needs, provide technical leadership, and translate complex requirements into actionable data strategies.
- Optimize and Operate: Monitor, troubleshoot, and continuously optimize data pipelines and infrastructure for peak performance and reliability in production environments.
Requirements
- Extensive Data Engineering Experience: Proven track record (3+ years) in designing, building, and maintaining large-scale data pipelines and data warehousing solutions.
- Cloud Platform Mastery: Expert-level proficiency with at least one major cloud provider (GCP-Preferred, AWS, or Azure), including their data, compute, and storage services.
- Programming Prowess: Strong programming skills in Python and SQL are essential.
- Big Data Ecosystem Expertise: Hands-on experience with big data technologies like Apache Spark, Kafka, and data orchestration tools such as Apache Airflow or Prefect.
- ML Data Acumen: Solid understanding of data requirements for machine learning models, including feature engineering, data validation, and dataset versioning.
- Vector Database Experience: Practical experience working with vector databases (e.g., Pinecone, Milvus, Chroma) for embedding storage and retrieval.
- Generative AI Familiarity: Understanding of data paradigms for LLMs, RAG architectures, and how data pipelines support fine-tuning or pre-training.
- MLOps Principles: Familiarity with MLOps best practices for deploying and managing ML models in production.
- Data Governance & Ethics: Experience implementing data governance frameworks, ensuring data quality, privacy, and compliance, with an awareness of ethical AI considerations.
Bonus Points If You Have:
- Direct experience with Hugging Face ecosystem, PyTorch, or TensorFlow for data preparation in an ML context.
- Experience with real-time data streaming architectures.
- Familiarity with containerization (Docker, Kubernetes).
- Master's or Ph.D. in Computer Science, Data Engineering, or a related quantitative field.
Benefits & conditions
Job Description Data & AI Product Lead London - Hybrid - PermanentUp to £110,000 VIQU have partnered with a leading insurance organisation seeking a Data & AI Product Lead to shape and drive their UK&I data and AI product strategy. As a Data & AI Product Lead, you will..., Mid/Senior/Staff Backend Engineer | Full stack, Data/AI | Greentech, B2B Logistics | Recent Series A Raise | Salary up to £160,000 + Equity, Bonus, Benefits | London, Hybrid (4 days PW)
Owen Thomas | Pending B Corp
Mid/Senior/Staff Backend Engineer | Full stack, Data/AI | Greentech, B2B Logistics | Recent Series A Raise | Salary up to £160,000 + Equity, Bonus, Benefits | London, Hybrid (4 days PW) The Company We are working with a Series A backed scale-up, that has raised over $...