Principal Data Scientist

Thermofisher Scientific
Wilmington, United States of America
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
$ 215K

Job location

Wilmington, United States of America

Tech stack

API
Artificial Intelligence
Amazon Web Services (AWS)
Data analysis
Azure
Bioinformatics
Health Informatics
Cloud Engineering
Code Review
Computer Programming
Continuous Integration
Decision Support Systems
R
Monitoring of Systems
Python
Machine Learning
Natural Language Processing
TensorFlow
Search Technologies
SQL Databases
Web Platforms
Feature Engineering
PyTorch
Large Language Models
Snowflake
Spark
Deep Learning
Model Validation
Generative AI
Data Strategy
GIT
Data Lake
Scikit Learn
Information Technology
XGBoost
Data Management
Machine Learning Operations
Virtual Agents
Text Analysis
Software Version Control
Databricks

Job description

At Thermo Fisher's PPD clinical research business, we're using digital innovation, data science, and AI to reimagine how life-changing therapies reach patients. Our teams combine deep scientific expertise with advanced analytics, automation, and digital platforms to make research smarter, faster, and more connected.

We know that innovation happens when diverse minds meet. Our Digital Science, Data, and AI professionals collaborate closely with scientists, clinicians, and operational experts to solve real-world challenges in clinical research. Alongside our partnership with Open AI, you can be part of the collaboration that will help to improve the speed and success of drug development, enabling customers to get medicines to patients faster and more cost effectively.

You'll join a culture that values experimentation, learning, and collaboration - where your ideas can help shape how we deliver life-saving solutions and improve global health outcomes. Whether you're a data engineer, product manager, software developer, or AI scientist, you'll find opportunities here to apply your skills to work that truly matters - improving global health outcomes.

Principal Data Scientist - Patient Analytical Services Division (PASD)

The Principal Data Scientist is a senior individual contributor and the deepest technical voice on the PASD data science team, focused on applying machine learning, advanced analytics, and modern AI to patient-level healthcare data. This role partners closely with epidemiologists, statisticians, RWE scientists, data engineers, and consulting teams to build scalable analytical and AI solutions that power evidence generation and decision support for biopharmaceutical, biotech, and medical device clients.

The role balances three areas: rigorous ML and advanced analytics on complex patient data (claims, EHR, registries, linked datasets), responsible adoption of generative AI and agentic solutions for analytics productivity and client-facing workflows, and fluent collaboration within RWE and patient analytics contexts. This is a hands-on technical leadership role; influence is exercised through technical depth, mentorship, and setting engineering and modeling standards, not through people or portfolio management., Technical Leadership (Individual Contributor)

  • Serve as a senior technical expert across the full analytics lifecycle, including problem framing, data strategy, model development, validation, deployment, and monitoring.
  • Set and uphold high standards for modeling rigor, reproducibility, and engineering quality across the data science team.
  • Mentor data scientists and engineers, review code and modeling approaches, and raise the technical bar on projects without owning delivery management.
  • Evaluate emerging methods, tools, and frameworks, and guide adoption where they add measurable value.

Machine Learning & Advanced Analytics for Patient Data

  • Build predictive and descriptive models on patient-level healthcare data to support use cases such as patient stratification, risk prediction, text analytics, workflow prioritization, and decision support.
  • Apply appropriate methods across classical statistical modeling, machine learning, and deep learning, including survival analysis, causal inference, propensity scoring, and longitudinal modeling where relevant.
  • Design feature engineering, evaluation, and validation approaches suited to the complexities of real-world healthcare data, including missingness, censoring, bias, and longitudinal structure.
  • Develop reproducible, well-tested pipelines using modern data science tooling, experiment tracking, and scalable compute.

Generative AI & Agentic Solutions

  • Identify and implement high-value applications of generative AI to improve analytics productivity, scientific review, knowledge retrieval, and internal and client-facing workflows.
  • Design and evaluate LLM-powered assistants, retrieval workflows, and agentic applications with appropriate human oversight, traceability, and quality controls.
  • Partner with platform and engineering teams to operationalize AI applications using enterprise tooling for experimentation, tracing, evaluation, and monitoring, ensuring responsible deployment in regulated and client-facing environments.

Cross-Functional Partnership

  • Partner with RWE scientists, epidemiologists, statisticians, data engineers, product owners, and consulting teams to translate scientific and business questions into sound analytical approaches.
  • Communicate methods, assumptions, limitations, and findings clearly to both technical and non-technical audiences, including client-facing contexts.
  • Translate technical outputs into scientific and business value for internal teams and client stakeholders.

Key Technologies

Languages and Analytics : Python, SQL, R

ML / AI : scikit-learn, XGBoost, PyTorch, TensorFlow, NLP libraries, LLM APIs

Statistics for RWD : survival analysis, causal inference, propensity scoring, longitudinal modeling

Data Platforms : Databricks, Spark, Delta Lake, Snowflake, AWS, Azure

LLMOps / Agentic AI: MLflow, prompt and version tracking, tracing, evaluation frameworks, RAG architectures, vector search, agent orchestration frameworks

Engineering and Delivery : Git, CI/CD, notebooks, APIs

Requirements

  • Bachelors degree in data science, computer science, statistics, biostatistics, epidemiology, mathematics, bioinformatics, or a related quantitative field or
  • Master's degree with significant progressive experience in data science, machine learning, or healthcare analytics (preferred).
  • Previous experience in data science that provides the knowledge, skills, and abilities to perform the job (comparable to 8-10 years' experience).
  • Hands-on experience applying ML and advanced analytics to real-world healthcare data such as claims, EHR, registries, or other patient-level longitudinal datasets.
  • Strong programming skills in Python and SQL; working proficiency in R.
  • Solid grounding in statistical modeling, machine learning, and model evaluation.
  • Experience working in modern cloud and data platforms such as Databricks, Spark, AWS, Azure, or Snowflake.
  • Strong software engineering fundamentals, including version control, modular code, testing, documentation, and reproducibility.
  • Strong written and verbal communication skills, with the ability to present methods and findings clearly to diverse audiences., * Experience applying ML and advanced analytics within RWE, HEOR, epidemiology, pharmacoepidemiology, or patient analytics at a pharma, biotech, CRO, medical device, or healthcare analytics organization.
  • Domain familiarity with oncology, immunology, rare disease, or therapeutic-area-specific patient analytics.
  • Experience with survival analysis, causal inference, propensity scoring, and longitudinal modeling applied to real-world data.
  • Experience with NLP, unstructured clinical text, knowledge retrieval, LLM applications, prompt evaluation, and agentic workflows.
  • Practical experience with MLOps / LLMOps capabilities such as experiment tracking, tracing, evaluation frameworks, model monitoring, and deployment governance.
  • Experience mentoring data scientists and contributing to technical standards in a matrixed environment.

At Thermo Fisher Scientific, we are committed to fostering a healthy and harmonious workplace for our employees. We understand the importance of creating an environment that allows individuals to excel. Please see below for the required qualifications for this position, which also includes the possibility of equivalent experience:

  • Able to communicate, receive, and understand information and ideas with diverse groups of people in a comprehensible and reasonable manner.
  • Able to work upright and stationary for typical working hours.
  • Ability to use and learn standard office equipment and technology with proficiency.
  • Able to perform successfully under pressure while prioritizing and handling multiple projects or activities.
  • May require as-needed travel (0-20%).

*Location: Remote US (East Coast preferred). Relocation assistance is NOT provided.

*Must be legally authorized to work in the United States without sponsorship.

*Must be able to pass a comprehensive background check, which includes a drug screening.

The annual salary range estimated for this position in North Carolina is $185,000- $215,000 USD. This position may also be eligible to receive a variable annual bonus based on company, team, and/or individual performance results in accordance with company policy.

Apply for this position