Data Engineer - Science
Role details
Job location
Tech stack
Job description
As Qureight scales its AI-driven imaging platform and advances development of foundation models and disease-specific AI models, we are building the data engineering capability required to support large-scale data preparation for machine learning., We are looking for a Data Engineer to focus on preparing and managing large imaging datasets (including CT scans and DICOM metadata) for use in machine learning workflows.
This role sits within the Science function and works closely with Machine Learning Scientists as well as other Data Engineers to ensure that data is delivered in a consistent, high-quality, and efficient format ready for model development. It will focus on designing and implementing the next iteration of our data infrastructure to accelerate our integration of machine learning into clinical trials.
You can read more about one of our Senior Software Engineers here.
What you will do
- Collaborate on designing and implementing new data infrastructure and pipelines preparing data for large-scale ML workflows
- Care about data quality, and ensuring the pipelines you build are robust, scalable, and maintainable
- Work with DICOM data to feed into foundation model and disease-specific imaging model development
- Collaborate closely with Machine Learning Scientists, DevOps Engineers, and other Data Engineers to create a tight feedback loop and ensure the end-to-end process is effective and efficient
- Ensure that our data processes have quality and compliance designed in from the start to make reproducibility, lineage tracking, and data quality painless
- Scale pipelines to handle millions of scans - ingesting the imaging data, transforming it, filtering and structuring ready for foundation model development.
Requirements
Do you have experience in Quality control?, * Proven experience as a Data Engineer in complex, data-rich environments
- Strong programming skills in Python
- Experience building and maintaining production ML data pipelines, including orchestration tools such as Dagster and cloud infrastructure on AWS
- Experience with Docker and Kubernetes based infrastructure Experience working with large datasets
- Understanding of data preprocessing and quality control for machine learning
- Strong collaboration skills with machine learning or technical teams
Even better if you have experience of...
- Medical imaging data such as CT, MRI, or DICOM
- Large-scale datasets or foundation model workflows
- Deployment tooling (Helm and familiarity with Gitops tooling such as Flux and Kustomize)
- Data versioning and reproducibility frameworks
- Database design and data modelling
- Working in regulated or GxP or ISO 13485 environments
- Experience with ML experiment tracking or metadata management (MLFlow)
Benefits & conditions
Pulled from the full job description
- Annual leave
- Life insurance
- Company pension
- Private medical insurance
- Enhanced maternity leave, * A comprehensive benefits package that includes an annual bonus plan, private medical insurance, life insurance, and a contributory pension scheme
- 25 days annual leave, plus bank holidays and enhanced maternity leave
- A diverse work environment that brings together experts in many fields, including software engineering, devops, data science, machine learning, quality assurance, regulatory affairs, and clinical operations.