Senior Data Scientist

InterVenn Biosciences

South San Francisco, United States of America

yesterday

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Compensation

$ 196K

Job location

South San Francisco, United States of America

Tech stack

Artificial Intelligence

Artificial Neural Networks

Bioinformatics

Computational Biology

Python

Machine Learning

TensorFlow

Feature Engineering

PyTorch

Deep Learning

Scikit Learn

Information Technology

XGBoost

Job description

We are seeking a creative, methodologically rigorous Senior Data Scientist to push the frontier of how we research and build classifiers from glycoproteomic data. This is a research-forward individual contributor role for someone who reaches across the full breadth of modern statistical and AI methods - classical ML, deep learning, foundation models for biology, generative approaches, and whatever the literature surfaces next - and is energized by open problems: new quantification and normalization schemes, novel feature engineering, multimodal model architectures, and the biological interpretation of model outputs., * Design, prototype, and rigorously evaluate novel classifier architectures for clinical diagnostics across oncology indications

Lead exploratory research into new quantification, normalization, and feature engineering methods for high-dimensional glycoproteomic data
Bring a diverse modeling toolkit - classical statistical methods, tree-based ensembles, deep learning, probabilistic and Bayesian approaches, foundation models, graph neural networks, and generative AI - and choose the right tool for the problem based on evidence rather than habit or hype
Develop cross-validation, calibration, and uncertainty-quantification strategies that hold up to the realities of small clinical cohorts and high feature counts
Investigate and mitigate batch, cohort, and site effects so that models generalize from discovery to bridging to locked panels
Drive cross-indication synthesis - separate shared disease biology from indication-conditioned signal, and from nonspecific inflammatory or acute-phase axes
Build multimodal models that combine glycan/motif information, proteomic grounding, and clinical covariates rather than relying on protein-quantity signal alone
Translate emerging techniques from the ML, AI, and computational-biology literature into production-ready methods
Mentor junior data scientists and raise the methodological bar across the team

Requirements

Do you have experience in Scientific publications?, * Ph.D. in Statistics, Computer Science, Computational Biology, Bioinformatics, or a related quantitative field, plus 6+ years of experience building predictive models on biological data in industry or academia; alternatively, an MS in a similar field with 8+ years of relevant experience

Demonstrated track record of methodological innovation - first-author publications, novel methods deployed in production, open-source contributions, or comparable evidence of original work
Deep proficiency in Python and/or R, including the modern ML stack (scikit-learn, PyTorch or TensorFlow, XGBoost/LightGBM, and similar)
Methodological breadth across paradigms - comfortable moving between classical statistics, tree-based ML, deep learning, and modern AI (transformers, graph neural networks, foundation models, generative methods) - and the judgment to argue rigorously for one approach over another
Strong statistical foundation: cross-validation strategy, regularization, calibration, uncertainty quantification, and handling of confounders and class imbalance
Hands-on experience building and validating classifiers on high-dimensional, low-sample-size biological data (proteomics, glycoproteomics, transcriptomics, or genomics)
Experience with batch-effect correction and normalization techniques, and a healthy skepticism about how those choices propagate into downstream performance estimates
Preference will be given to candidates with experience in multimodal modeling, interpretability methods, or foundation/representation-learning approaches for biological data
Familiarity with clinical diagnostic development - analytical and clinical validation, locking classifiers, and bridging studies - is a strong plus
Excellent written and verbal communication: able to explain novel methods clearly to wet-lab scientists, clinicians, and fellow statisticians alike
A genuine desire to impact patient lives and contribute to the broader scientific community

Role details

Job location

Tech stack

Job description

Requirements

Apply for this position

Good distractions

Moments

Videos View all