Lead Data Manager
Role details
Job location
Tech stack
Job description
The Laboratory for Intelligent Global Health and Humanitarian Technologies (LiGHT) at EPFL is seeking a Lead Data Manager to build and oversee all data management activities in the Lab. This is a unique opportunity to shape the lab's data architecture, pipelines, and governance from the ground up, supporting major AI-for-health initiatives and complex multinational clinical trials. Mission, * Lead and oversee all Data Management (DM) activities across LiGHT's research portfolio, including complex multicountry clinical trials in Africa and AI-based health innovation projects.
- Develop, implement, and continuously refine the LiGHT Data Management and Data Governance framework, integrated within the EPFL ecosystem and compliant with GCP, GDPR, DPDP (India) and relevant data protection regulations.
- Act as the focal point and accountable lead for all data management and compliance-related matters in LiGHT's role as Sponsor or Coordinating Centre within international consortia.
- Develop and maintain all SOPs, work instructions, templates, and forms related to data management, biostatistics, and data governance.
Data Architecture & Infrastructure
- Design, implement, and maintain secure, scalable data architectures for clinical and AI research projects, including EDC systems, data lakes, and integration pipelines.
- Oversee the design & integration of eCDS solutions and ensure interoperability across databases, AI model repositories, and analytics environments.
- Liaise with EPFL IT and research data offices to ensure alignment with institutional standards and data security policies.
Data Science & AI Integration
- Lead the development of data pipelines for AI model training, validation, and deployment, including data curation, annotation, quality assurance, and metadata governance.
- Develop data-driven dashboards, monitoring systems, and automated pipelines for data analysis and visualization.
- Support the development of AI-ready datasets, ensuring proper data provenance, consent tracking, and harmonization across sites.
- Work closely with data scientists and engineers on machine learning model governance, versioning, and reproducibility.
Operational Oversight & Mentorship
- Coordinate and supervise contractual data managers, data scientists, and external CROs or IT partners, ensuring timely and high-quality deliverables.
- Develop and maintain contractual and quality oversight mechanisms, including vendor assessment and performance tracking.
- Mentor students, junior researchers, and collaborators in data management, reproducible research, and ethical data practices.
- Support scientific reporting, donor communications, and publications based on LiGHT datasets.
Partnerships & Innovation
- Explore and establish strategic collaborations with external partners in academia, industry, and global health to strengthen LiGHT's data science ecosystem.
- Stay abreast of emerging trends in AI, data standards (e.g., CDISC), FAIR data principles, and federated learning for global health research.
Requirements
- PhD (preferred) or Master's degree in data science, computer science, biomedical informatics, or a related quantitative field
- Minimum 5 years of experience in data management, data engineering, or applied data science, including leadership in global health or clinical research.
- Demonstrated experience managing distributed data workflows in LMIC and multi-institutional research environments.
- Proven proficiency in all aspects of clinical data management, including database setup, CRF design, edit check programming, data validation, query management, SAE reconciliation, coding, data review, and database lock.
- Experience establishing and maintaining Data Quality Management Systems (QMS) for research environments.
- Proven experience managing data management subcontracts and oversight of external vendors or CROs.
- Strong understanding of data versioning, lineage tracking, and reproducibility standards, including Git-based workflows and CI/CD integration.
- Proficiency in Python
- Familiarity with CDISC, GxP standards, and international research data compliance with a strong foundation in data security and compliance, including GDPR and emerging frameworks like India's DPDP Act (2023).
- Demonstrated contribution to open-source or FAIR-compliant data infrastructure projects.
Preferred
- Familiarity with medical LLM applications or clinical decision support systems
- Experience with safety-critical evaluation protocols (e.g., benchmark leakage detection, hallucination profiling, audit trails)
- Deep interest in equity-centered deployment and global health implications of LLMs
- Experience with federated learning, synthetic data generation, or privacy-preserving AI.
- Familiarity with cloud environments (AWS, GCP, Azure) and HPC orchestration.
- Exposure to real-world evidence generation or digital health analytics in LMIC settings.
Benefits & conditions
- Mission-driven work at the intersection of technology, health, and global equity
- Opportunity to shape open-source platforms used by WHO, ICRC, and other partners
- A creative, ambitious, and collaborative team across continents
Competitive salary and benefits, aligned with experience and location