Principal Data Scientist - R&D DSDH - Preclinical Sciences & Translational Safety (PSTS)
Role details
Job location
Tech stack
Job description
The R&D Data Science organization is seeking a Data Scientist to leverage advanced machine learning, robust data engineering techniques, and domain expertise to drive impactful decisions and generate actionable insights within the Pharmaceutical Sciences & Translational Safety (PSTS) organization. In this role, you will work closely with multidisciplinary teams-including toxicologists, PK/PD specialists, in vivo researchers, and safety professionals-to create AI-ready datasets, develop predictive models, and deliver analytical solutions that promote improved safety evaluations and facilitate translational research., Machine Learning & Modeling
- Develop and deploy ML/AI models to support safety signal detection, dose selection, PK/PD modeling, toxicology insights, and translational interpretation.
- Implement representation-learning, predictive modeling, and multivariate analytics for datasets spanning in vivo studies, in vitro assays, exposure-response data, and pathology information.
- Partner with scientific SMEs to design modeling strategies aligned with PSTS decision points.
- Apply model governance, versioning, and validation standards consistent with R&D AI practices.
Data Engineering & Pipeline Development
- Build and maintain scalable data pipelines that integrate PSTS-relevant data sources (e.g., toxicology studies, PK/PD datasets, biomarker readouts, animal study repositories).
- Transform raw experimental outputs into standardized, analysis-ready, AI-ready datasets using Python, R, and cloud-native services.
- Contribute to harmonized scientific data models in collaboration with data engineering and ontology teams.
Scientific Domain Integration
- Work directly with toxicology, DMPK, and safety stakeholders to interpret scientific context and translate study designs into computational requirements.
- Apply understanding of mechanism-based toxicology, exposure-response concepts, and in vivo study structures to guide data transformations and modeling strategies.
- Enhance cross-study comparability via standardized terminologies, metadata practices, and quality checks.
- Collaborate with PSTS functional experts, R&D Data Science teams, and platform architects to ensure high-quality, scalable data solutions.
Requirements
The successful candidate possesses hands-on experience in machine learning and data engineering, complemented by a solid understanding of toxicology, pharmacokinetics/pharmacodynamics (PK/PD), in vivo experimentation, and translational science. Additionally, this role requires strong communication and problem-solving skills, a passion for innovation, and the ability to adapt to evolving scientific challenges in pharmaceutical R&D., * Advanced degree (MS or PhD) in Data Science, Computational Biology, Toxicology, Pharmacology, Biomedical Engineering, Computer Science, or related field.
- 3+ years of experience applying machine learning and/or data engineering to scientific or biomedical datasets.
- Proficiency with Python and/or R, SQL, and modern data engineering tooling (cloud computing, workflow orchestration, version control).
- Experience with ML model development, evaluation, and deployment pipelines.
- Experience working with biological, toxicology, PK/PD, or in vivo datasets.
Preferred
- Experience in safety sciences, ADME/DMPK, toxicogenomics, or biomarker analytics.
- Familiarity with scientific data formats (e.g., assay outputs, histopathology data, PK time-course datasets).
- Exposure to ontologies, semantic technologies, or knowledge graph integration for scientific domains.
- Experience with cloud-based data architectures (AWS S3, Snowflake, Redshift).
- Understanding of regulatory data standards (e.g., SEND, CDISC).