Senior Data Engineer
Role details
Job location
Tech stack
Job description
The Senior Data Engineer will design, build, and deliver a new enterprise data product supporting the clients generative drug design and computational chemistry platforms. This role focuses on creating scalable, well-structured data architecture from the ground up, with long-term expansion and downstream AI/ML integration in mind. The ideal candidate combines strong data engineering expertise with an understanding of drug design, chemistry, and scientific data workflows.
-Design and implement a new enterprise data product, initially scoped as a standalone deliverable with future integration into broader AI-driven drug discovery platforms.
-Build scalable data pipelines, schemas, and storage models capable of supporting large, complex scientific and chemistry-derived datasets.
-Develop data solutions primarily on GCP / BigQuery, adhering to enterprise data engineering templates and standards.
-Implement data transformations and pipelines using Python, with a focus on data quality, traceability, and performance.
-Ensure the data architecture supports future expansion, additional datasets, and evolving analytical and computational needs.
-Collaborate closely with computational chemists, data scientists, and ML engineers to ensure data models align with generative design, molecular representations, and ML outputs.
-Apply an understanding of drug design and chemistry concepts (e.g., molecular properties, structure-activity data, experimental outputs) to inform data modeling and integration decisions.
-Provide technical guidance on data structure, scalability, and long-term maintainability in an enterprise environment.
Requirements
Strong experience in data engineering, including database, schema, and data product design.
-Hands-on experience with GCP and BigQuery (Postgres familiarity a plus).
-Proficiency in Python for building and maintaining data pipelines.
-Onyx background
-Experience working with large, complex datasets at scale, ideally in scientific or R&D contexts.
-Background in life sciences, pharma, or scientific data platforms. -Experience supporting downstream analytics, ML pipelines, or AI-driven platforms, particularly in R&D or discovery environments.
-Background in life sciences, pharma, or scientific data platforms.
-Working knowledge or hands-on exposure to drug design, chemistry, or computational chemistry data.