Vector Data Engineer
Johnson & Johnson, S.a.
Municipality of Cornellà de Llobregat, Spain
8 days ago
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
IntermediateJob location
Municipality of Cornellà de Llobregat, Spain
Tech stack
Artificial Intelligence
Data analysis
ARM
Clinical Data Repository
Computational Biology
Databases
Information Engineering
Data Integration
Data Transformation
Data Stores
Data Visualization
Python
Machine Learning
Metadata
Snowflake
Indexer
Information Technology
Job description
The Vector Data Engineer designs and implements the embedding and semantic-search infrastructure that connects discovery, translational, and clinical data into AI-ready knowledge representations. This role bridges multi-omics data engineering and machine-learning infrastructure, enabling scientists and agentic tools to discover biological insights through vector-based search and reasoning. Key Responsibilities
- Develop scalable pipelines that convert multi-omics and clinical data (e.g., proteomics, transcriptomics, spatial omics, biomarkers) into vectorized embeddings for AI and semantic retrieval.
- Build and maintain vector databases and hybrid data stores using technologies such as TileDB, Weaviate, or Snowflake Cortex.
- Collaborate with the Data Transformation Engineers to design standardized data formats suitable for embedding generation and cross-modality mapping.
- Integrate metadata, ontology terms, and provenance into vector representations to ensure traceability and governance compliance.
- Partner with the AI/ML Team to deploy embeddings supporting agentic reasoning, semantic similarity, and cross-dataset query.
- Optimize indexing, retrieval, and inference performance across large-scale multi-omics data collections.
- Evaluate and incorporate emerging representation-learning and knowledge-graph techniques to improve data discoverability and model interoperability.
Requirements
- MS/PhD in Computer Science, Computational Biology, Data Science, or related field.
- 3+ years of experience building or maintaining vector or semantic-retrieval infrastructure.
- Hands-on experience with multi-omics or biomedical data integration (e.g., RNA-seq, proteomics, clinical endpoints).
- Proficiency in Python and frameworks such as LangChain, Transformers, or sentence-embedding models.
- Familiarity with TileDB, Snowflake, Weaviate, FAISS, or other vector/array database systems.
- Understanding of metadata modeling, ontologies (e.g., OBO, UMLS), and FAIR data practices.
- Strong ability to collaborate across solution architecture, data science, and AI/ML teams.
Strategic Impact
- Multi-omics and clinical data assets transformed into interoperable, vectorized embeddings supporting scientific AI applications.
- AI can perform semantic queries and reasoning over governed datasets.
- Vector database infrastructure scales efficiently and complies with governance and lineage standards.
- Accelerated insight generation across discovery, translational, and clinical domains., * Advanced Analytics
- Business Intelligence (BI)
- Coaching
- Collaborating
- Critical Thinking
- Data Analysis
- Database Management
- Data Privacy Standards
- Data Reporting
- Data Savvy
- Data Science
- Data Visualization
- Econometric Models
- Process Improvements
- Technical Credibility
- Technologically Savvy
- Workflow Analysis
Preferred Skills
- Advanced Analytics
- Business Intelligence (BI)
- Coaching
- Collaborating
- Critical Thinking
- Data Analysis
- Database Management
- Data Privacy Standards
- Data Reporting
- Data Savvy
- Data Science
- Data Visualization
- Econometric Models
- Process Improvements
- Technical Credibility
- Technologically Savvy
- Workflow Analysis
About the company
At Johnson & Johnson, we believe health is everything. Our strength in healthcare innovation empowers us to build a world where complex diseases are prevented, treated, and cured, where treatments are smarter and less invasive, and solutions are personal. Through our expertise in Innovative Medicine and MedTech we are uniquely positioned to innovate across the full spectrum of healthcare solutions today to deliver the breakthroughs of tomorrow, and profoundly impact health for humanity. Learn more at https://www.jnj.com