Principal Scientist, Data Science - R&D DSDH - Therapeutics Development & Supply (TDS)
Role details
Job location
Tech stack
Job description
The R&D Data Science organization is seeking a Data Scientist - Data Engineer to design, build, and optimize data capture, processing, and storage solutions that enable advanced analytics, digital process transformation, and AI/ML applications across the development-to-supply continuum for Therapeutics Development & Supply (TDS).
You will be a hands-on technical contributor working across Process Development, Manufacturing, Supply Chain, Quality, and Digital/Data Science teams to deliver high-quality, AI-ready data pipelines and data products. This role involves creating robust, future-proof data systems, engineering workflows, and high-value data repositories that support scientific, technical, and operational decision-making., Data Engineering & Pipeline Development
- Design, build, and maintain scalable data pipelines for acquiring, integrating, and managing TDS data from diverse data generation sources and systems (e.g., lab systems, MES, clinical supply, quality systems, external partners).
- Create and optimize data flows for structured and unstructured data using Python, R, SQL, cloud services, and other modern engineering tools.
- Develop and maintain TDS-specific data repositories, implementing enterprise-level data models and creating new models as needed.
- Enable AI/ML readiness by ensuring data is well-structured, versioned, traceable, and semantically aligned with enterprise data standards.
Data Product & Architecture Partnership
- Partner with data scientists, TDS domain experts, and digital technology teams to translate business needs into high-quality data products and engineering requirements.
- Work closely with ontology/knowledge graph teams to implement semantic models and future-proof data architectures.
Quality, Compliance & Performance
- Implement data quality and performance standards; define KPIs to measure accuracy, completeness, and consistency across TDS data assets.
- Apply data versioning and lineage tracking for compliance, traceability, and audit readiness.
- Follow software development best practices including code versioning, DevOps integration, and documentation.
Cross-Functional Collaboration
- Engage with scientific, technical, and operations stakeholders to understand requirements, design data solutions, and drive adoption.
- Support multiple concurrent projects, managing priorities and delivering maximum business value across the TDS network.
Requirements
- Advanced degree in Engineering, Data Science, Life Sciences, Computer Science, or related field; advanced degree preferred.
- 3+ years of experience in data engineering, including data modeling and database design, preferably in a scientific, manufacturing, or healthcare environment.
- Proficiency with Python, R, SQL, and cloud-based architectures (e.g., AWS services, Snowflake, Redshift).
- Experience with NoSQL and graph databases.
- Strong analytical, problem-solving, and stakeholder-management skills, with the ability to translate discussions into actionable requirements.
- Ability to drive multiple exciting projects simultaneous with strong organizational skills and adaptability.
Preferred
- Experience with regulated or standards-driven data environments, such as CDISC, HL7, FHIR, OMOP, DICOM, or manufacturing/quality data standards.
- Familiarity with high-dimensional data (e.g., imaging, sensor data, etc).
- Experience with principles connecting to or feeding MLOps and model deployment workflows.
- Knowledge of manufacturing systems (MES), laboratory information systems, or industrial data systems.
- Exposure to knowledge graph or ontology-driven architectures.