Advisor - Data Architect, Data Foundry

The Lilly Company
Boston, United States of America
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Intermediate
Compensation
$ 222K

Job location

Boston, United States of America

Tech stack

API
Artificial Intelligence
Amazon Web Services (AWS)
Azure
Bioinformatics
C++
Cloud Computing
Encodings
Computational Biology
Databases
Data Architecture
Information Engineering
Data Governance
Data Infrastructure
Data Integration
ETL
Data Structures
Data Systems
Relational Databases
Database Queries
Software Design Patterns
Graph Database
Information Sciences
Laboratory Information Management Systems
Metadata Standards
MongoDB
Neo4j
Web Ontology Language
Query Optimization
Semantic Web
Software Engineering
SPARQL
Data Streaming
Data Processing
Data Storage Technologies
Snowflake
Spark
Event Driven Architecture
Build Management
Data Lake
Information Technology
Apache Flink
Integration Frameworks
Real Time Data
Kafka
Data Management
Data Lakehouse
Data Pipelines
Databricks

Job description

We are seeking Data Architects at multiple levels to design and build the data infrastructure that makes AI-native drug discovery possible. You will create the schemas, ontologies, data models, knowledge graphs, and platform architectures that transform raw scientific data into machine-actionable, FAIR-compliant, insight-ready assets-serving both discovery scientists and autonomous AI agents.

This role is the foundation of Architecture4Insight . Everything the software engineering team builds-pipelines, APIs, prototypes-depends on the data models and platform architecture this team designs. You will work with deep knowledge of scientific data (chemical, biological, HTE, automation-generated) to create custom-fit solutions, then partner with Tech@Lilly to scale and maintain them. The role spans three focus areas depending on expertise: data modeling & ontologies , data platform & lakehouse architecture , and knowledge graph & specialized data systems . You will independently design schemas, select technologies, and make build-vs-buy recommendations for their domain.

Responsibilities

Data Modeling & Ontologies

  • Design and implement data models, schemas, and ontologies for chemical, biological, and automation-generated data that serve discovery workflows across the portfolio.

  • Define and maintain controlled vocabularies, metadata standards, and FAIR-compliant data frameworks in partnership with Preparedness4Insight.

  • Implement semantic data standards (RDF, OWL, SPARQL) and ontology engineering practices to create interoperable, machine-readable scientific data.

Data Platform & Lakehouse Architecture

  • Design and implement data lakehouse architecture using modern platforms (Databricks, Snowflake, or equivalent), including data storage patterns, partitioning strategies, and query optimization.

  • Build and optimize ETL/ELT pipelines using Spark, dbt, or similar tools to transform raw scientific data into analytical and ML-ready formats.

  • Implement real-time and streaming data integration (Kafka, Kinesis, event-driven patterns) connecting LIMS, instruments, and lab automation systems to the data infrastructure.

Knowledge Graph & Specialized Data Systems

  • Design and implement knowledge graphs (Neo4j, Amazon Neptune, TigerGraph) that capture molecular, target, pathway, and experimental relationships across the discovery landscape.

  • Architect specialized data solutions: array databases (TileDB) for genomics/imaging, document stores (MongoDB) for experimental records, and vector databases for embedding-based retrieval supporting ML and RAG workflows.

  • Build query and traversal patterns that enable scientists and AI agents to ask relational questions across the entire data landscape.

Cross-Functional Partnership

  • Partner with scientific software engineers to ensure data architectures are implementable, performant, and well-documented.

  • Collaborate with Methods4Insight to design data structures that support analytical model training, deployment, and evaluation.

  • Work with Tech@Lilly to define scaling strategies, ensure enterprise compliance, and transition data architectures to production-grade management.

  • Contribute to build-versus-buy-versus-adopt decisions by evaluating commercial and open-source data platforms against Data Foundry requirements.

Requirements

  • M.S. or PhD in Computer Science, Data Science, Bioinformatics, Computational Biology, Information Science, or related STEM field

  • MS (with 6+ years ) and PhD (with 2+ years) of data architecture, data engineering, or scientific informatics experience.

  • Deep expertise in at least one of the focus areas: relational databases, data modeling and ontology engineering, data platform and lakehouse architecture (Databricks, Snowflake, Spark), or knowledge graph and specialized database systems (Neo4j, Neptune, MongoDB, TileDB)

Preferred Qualifications

  • Working familiarity with multiple database paradigms - relational, graph, document, columnar, key-value - and strong SQL skills.

  • Understanding of scientific data types and experimental workflows in life sciences or pharma (chemical, biological, HTE data).

  • Strong communication skills with ability to translate data architecture concepts for both technical and scientific audiences.

  • Familiarity with cloud platforms (AWS, Azure, or GCP) and modern data integration patterns.

  • Pharmaceutical or biotech research industry experience, particularly in discovery data management or research informatics.

  • Experience with semantic web technologies: RDF, OWL, SPARQL, Protégé, or equivalent ontology engineering tools.

  • Hands-on experience with graph databases (Neo4j, Neptune, TigerGraph) and knowledge graph design patterns for scientific data.

  • Data lakehouse architecture experience: Databricks (Delta Lake, Unity Catalog), Snowflake, or equivalent; ETL/ELT with Spark, dbt.

  • Experience with streaming/real-time data platforms (Kafka, Kinesis, Flink) and event-driven architectures.

  • Familiarity with LIMS, ELN systems (e.g., Benchling), and laboratory instrument data integration.

  • Experience with vector databases (Pinecone, Weaviate, pgvector) and embedding-based retrieval for ML/RAG applications.

  • Array database experience (TileDB, Zarr) for genomics, imaging, or high-dimensional scientific data.

  • FAIR data principles implementation experience and Data Readiness Level frameworks.

  • Scientific data standards and controlled vocabularies in chemistry (InChI, SMILES) or biology (Gene Ontology, UniProt).

  • Experience with C, C++, or Rust for performance-critical data processing; familiarity with HPC data I/O patterns for large-scale scientific computations.

Benefits & conditions

Actual compensation will depend on a candidate's education, experience, skills, and geographic location. The anticipated wage for this position is

$151,500 - $222,200

Full-time equivalent employees also will be eligible for a company bonus (depending, in part, on company and individual performance). In addition, Lilly offers a comprehensive benefit program to eligible employees, including eligibility to participate in a company-sponsored 401(k); pension; vacation benefits; eligibility for medical, dental, vision and prescription drug benefits; flexible benefits (e.g., healthcare and/or dependent day care flexible spending accounts); life insurance and death benefits; certain time off and leave of absence benefits; and well-being benefits (e.g., employee assistance program, fitness benefits, and employee clubs and activities).Lilly reserves the right to amend, modify, or terminate its compensation and benefit programs in its sole discretion and Lilly's compensation practices and guidelines will apply regarding the details of any promotion or transfer of Lilly employees.

#WeAreLilly

About the company

At Lilly, we unite caring with discovery to make life better for people around the world. We are a global healthcare leader headquartered in Indianapolis, Indiana. Our employees around the world work to discover and bring life-changing medicines to those who need them, improve the understanding and management of disease, and give back to our communities through philanthropy and volunteerism. We give our best effort to our work, and we put people first. We're looking for people who are determined to make life better for people around the world.

Apply for this position