Data Curation Developer
Role details
Job location
Tech stack
Job description
- Lead the development of business requirements for data curation through collaboration with R&D business and data platform teams.
- Maintain strong connections with analytical groups and R&D Data Platform teams to ensure seamless data integration and usage.
- Deliver pre-packaged, curated (e.g. pre-process, harmonize, wrangle, contextualise and/or anonymise) datasets aligned to business requirements for analytics, which includes documenting data specification that clearly describes the required processing steps to generate analysis-ready datasets ensuring providence, lineage and privacy requirements is maintained.
- Integrate diverse datasets (e.g., clinical trials, real-world data, omics) into a unified format for consistent analysis.
- Ensure all datasets meet analysis-ready and privacy requirements by performing necessary data curation activities (e.g. pre-process, contextualise and/or anonymise).
- Provide coaching and peer review to ensure that the team's work reflects industry best practices for data curation activities, including data privacy and anonymization standards.
- Ensure that datasets are processed to meet conditions mentioned in the approved data re-use request (e.g., remove subjects from countries that do not allow re-use). Write clean, readable code.
- Ensure that deliverables are appropriately quality controlled, documented, and when required, can be handed over to R&D Tech team for production pipeline implementation.
Requirements
Do you have experience in Responsive web design?, Do you have a Master's degree?, * BSc/MSc/PhD (or equivalent) in Computer Science, Mathematics, Statistics, or related subject
- Proven experience of handling various modalities of scientific clinical data such as clinical trial data (including biomarkers), real world data (RWD), omics etc.
- Experience in Python, Databricks, Delta Lake, PySpark, Pandas, other data engineering frameworks and applying them to achieve industry standards-compliant datasets
- Proven ability to handle and process large structured, semi-structured, and unstructured datasets efficiently
- Strong communication skills and expertise to translate business needs into technical data requirements and processes
- Ability to quantify and provide insights to business impact and value creation from data curation activities
- Experience with at least one of the industry data standards such as CDISC(ODM: CDASH, SDTM, ADaM), HL7 FHIR, OMOP(CDM) etc.
Preferred Qualifications & Skills:
Please note the following skills are not necessary, just preferred, if you do not have them, please still apply:
- Experience in R
- Agile mindset with the ability to deliver prototypes quickly and iterate improvements based on stakeholder feedback
- Experience with digital clinical trials protocol and Unified Study Definition Model (USDM)Experience in data modelling
Benefits & conditions
This role focuses on the technical experience required to curate (e.g. pre-process, harmonize, wrangle and contextualise) data to produce high-quality data assets for R&D analysis. The aim is to support GSK's Disease Area Strategies and other key R&D priority areas by making data analysis-ready, enabling efficient and effective decision-making across various therapeutic areas.
Please note that depending on experience level, candidates may be considered at either the G6 or G7 level.
We create a place where people can grow, be their best, be safe, and feel welcome, valued and included. We offer a competitive salary, an annual bonus based on company performance, healthcare and wellbeing programmes, pension plan membership, and shares and savings programme.
We embrace modern work practises; our Performance with Choice programme offers a hybrid working model, empowering you to find the optimal balance between remote and in-office work.
Discover more about our company wide benefits and life at GSK on our webpage Life at GSK | GSK