Scientific Knowledge Engineer, Ontology & Data Modeling
Role details
Job location
Tech stack
Job description
JSON Schema Neo4j Descripción del empleo Scientific Knowledge Engineer, Ontology & Data Modeling
This role is responsible for maximizing the value of our data assets over a lifetime to bring purpose to data by acting as translators of highly technical information from domain experts into an appropriate data model - complete with significant ontology and vocabulary - that can be utilized to effectively structure and index the data. Specifically, the engineer works with Product managers and R&D subject matter expertise to define the language (data models, ontology, standards, etc.) of science into data products by acting as the voice of the "Knowledge base" and the interoperability/value of the asset. Key Responsibilities
- Definition of schemas/ontology and data models of scientific information required for the creation of value-adding data products. This includes accountability for the quality control and mapping specifications to be industrialized by data engineering and maintained in platform-provisioned tooling.
- Accountable for the quality control (through validation and verification) of mapping specifications to be industrialized by data engineering and maintained in platform-provisioned tooling - e.g., models, schemas, controlled vocab.
- Working with Product managers/engineers confidently converting business needs into defined deliverable business requirements to enable the integration of large-scale biology data to predict, model, and stabilize therapeutically relevant protein complex and antigen conformations for drug and vaccine discovery.
- Collaborate with external groups to align data standards with industry/academic ontologies ensuring that data standards are defined with usage/analytics in mind.
- Provide bespoke subject-matter expertise for R&D data to translate deep science into data for actionable insights.
- Contribute to and maintain documentation of data standards, ontology decisions, and mapping rationale to support organizational knowledge transfer and auditability.
Requirements
- Masters degree in a relevant field (Bioinformatics, Biomedical Science, etc.).
- 6+ years of relevant experience in Knowledge Graph development.
- Hands-on experience with ontology tools and languages., * Masters degree in Bioinformatics, Biomedical Science, Biomedical Engineering, Molecular Biology, or Computer Science (with a life science application focus).
- 6+ years of relevant work experience.
- Specific experience contributing to Knowledge Graph development efforts, including entity modeling, relationship design, and schema governance.
- Hands-on experience with open-source ontology tools and languages: Protégé, SPARQL, OWL, SKOS, SHACL, RML, RDF/Turtle.
- Working knowledge of major life sciences ontologies: Gene Ontology (GO), OBO Foundry ontologies (CL, UBERON, HPO, MONDO, CHEBI, EFO, CLO), MeSH, SNOMED CT, UMLS.
- Familiarity with linked data principles and semantic web technologies.
- Experience with industry-standard tools for building data serialization protocols (e.g., JSON Schema, LinkML).
- Proficiency in at least one programming language - preferably Python - for scripting vocabulary mappings, building data models, automating QC, and prototyping pipelines., * Experience with data governance and data quality tooling (e.g., Ataccama, Informatica, Talend, OpenRefine, Great Expectations, dbt).
- Experience with at least one programming language - e.g., Python - for scripting vocabulary mappings, building data models, etc.
- Experience supporting LLM integration or AI-readiness workflows - including metadata enrichment, entity linking, embedding pipelines, or retrieval-augmented generation (RAG) architectures.
- Understanding of vector databases and their role in semantic search and knowledge retrieval (e.g., Weaviate, Chroma).
- Familiarity with cloud data platforms and infrastructure relevant to large-scale biological data (e.g., AWS, GCP, Azure).
- Familiarity with graph database technologies (e.g., Neo4j, Amazon Neptune, Stardog, GraphDB, TigerGraph).