Senior Bioinformatics Data Engineer

Nucleome Therapeutics Ltd

Oxford, United Kingdom

2 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Compensation

£ 65K

Job location

Oxford, United Kingdom

Tech stack

Bioinformatics

Cloud Computing

Computer Programming

Databases

Corona (Software Development Kit)

Information Engineering

Data Files

DevOps

Elasticsearch

Python

PostgreSQL

DataOps

Data Streaming

Data Processing

Containerization

Core Data

Storage Technologies

Non-relational Database

Data Pipelines

Job description

Driving the design, development, and continuous improvement of core data infrastructure that enables automated data flows from unstructured lab data through to structured and highly accessible enriched data. You will work closely with the lead software engineer to make technical and architectural decisions, collaborating on strategy and solution design while taking ownership of implementation and working independently to solve complex data engineering challenges. Together, you will iterate on solutions and ensure best practices are maintained across the platform.
Proactively identifying opportunities to enhance and optimize existing data pipelines that are vital to the business's decision-making processes. The lab team relies on these pipelines to generate the critical data needed for analysis on drug target viability, making your work directly impactful to therapeutic development priorities. You will work closely with the computational team to prioritize improvements, implement iterative enhancements, and ensure pipelines remain robust, scalable, and aligned with evolving scientific and business requirements.
Collaborating closely with lab and computational teams to ensure data is readily available, reliable, and properly integrated across the platform. You will take ownership of data quality, accessibility, and performance, making decisions on data modelling, storage strategies, and processing optimizations while maintaining alignment with the broader engineering team's standards and practices.

Requirements

BSc, MSc or PhD in a mathematical, computational or science discipline. Industry experience in biotech or pharma would be desirable.

Extensive practical experience defining and developing scalable biological data processing pipelines, with knowledge of Nextflow and Seqera being advantageous. Experience with bioinformatics tools, databases, biological datasets, and performant data file formats (parquet etc.) alongside statistical analysis.
Experience in data modelling and implementation in relational and non-relational databases such as PostgreSQL and Elasticsearch, as well as cloud-based data processing and storage technologies, infrastructure, and containerization.
Strong programming skills in Python, extensive experience with modern data engineering tooling and orchestration platforms such as Dagster, and experience with the processes and tooling of modern software development, DevOps, and DataOps.
Strong communication, organisational and time management skills and the ability to communicate complex ideas to technical and non-technical audiences. Ability to work independently and as a member of a multidisciplinary team in a highly dynamic environment.
Ability to be very detail orientated in implementation whilst being aware of the big picture from a planning, time scale and business value perspective.