Senior Data Scientist

InterVenn Biosciences
South San Francisco, United States of America
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
$ 196K

Job location

South San Francisco, United States of America

Tech stack

Artificial Intelligence
Artificial Neural Networks
Bioinformatics
Computational Biology
Python
Machine Learning
TensorFlow
Feature Engineering
PyTorch
Deep Learning
Scikit Learn
Information Technology
XGBoost

Job description

We are seeking a creative, methodologically rigorous Senior Data Scientist to push the frontier of how we research and build classifiers from glycoproteomic data. This is a research-forward individual contributor role for someone who reaches across the full breadth of modern statistical and AI methods - classical ML, deep learning, foundation models for biology, generative approaches, and whatever the literature surfaces next - and is energized by open problems: new quantification and normalization schemes, novel feature engineering, multimodal model architectures, and the biological interpretation of model outputs., * Design, prototype, and rigorously evaluate novel classifier architectures for clinical diagnostics across oncology indications

  • Lead exploratory research into new quantification, normalization, and feature engineering methods for high-dimensional glycoproteomic data
  • Bring a diverse modeling toolkit - classical statistical methods, tree-based ensembles, deep learning, probabilistic and Bayesian approaches, foundation models, graph neural networks, and generative AI - and choose the right tool for the problem based on evidence rather than habit or hype
  • Develop cross-validation, calibration, and uncertainty-quantification strategies that hold up to the realities of small clinical cohorts and high feature counts
  • Investigate and mitigate batch, cohort, and site effects so that models generalize from discovery to bridging to locked panels
  • Drive cross-indication synthesis - separate shared disease biology from indication-conditioned signal, and from nonspecific inflammatory or acute-phase axes
  • Build multimodal models that combine glycan/motif information, proteomic grounding, and clinical covariates rather than relying on protein-quantity signal alone
  • Translate emerging techniques from the ML, AI, and computational-biology literature into production-ready methods
  • Mentor junior data scientists and raise the methodological bar across the team

Requirements

Do you have experience in Scientific publications?, * Ph.D. in Statistics, Computer Science, Computational Biology, Bioinformatics, or a related quantitative field, plus 6+ years of experience building predictive models on biological data in industry or academia; alternatively, an MS in a similar field with 8+ years of relevant experience

  • Demonstrated track record of methodological innovation - first-author publications, novel methods deployed in production, open-source contributions, or comparable evidence of original work
  • Deep proficiency in Python and/or R, including the modern ML stack (scikit-learn, PyTorch or TensorFlow, XGBoost/LightGBM, and similar)
  • Methodological breadth across paradigms - comfortable moving between classical statistics, tree-based ML, deep learning, and modern AI (transformers, graph neural networks, foundation models, generative methods) - and the judgment to argue rigorously for one approach over another
  • Strong statistical foundation: cross-validation strategy, regularization, calibration, uncertainty quantification, and handling of confounders and class imbalance
  • Hands-on experience building and validating classifiers on high-dimensional, low-sample-size biological data (proteomics, glycoproteomics, transcriptomics, or genomics)
  • Experience with batch-effect correction and normalization techniques, and a healthy skepticism about how those choices propagate into downstream performance estimates
  • Preference will be given to candidates with experience in multimodal modeling, interpretability methods, or foundation/representation-learning approaches for biological data
  • Familiarity with clinical diagnostic development - analytical and clinical validation, locking classifiers, and bridging studies - is a strong plus
  • Excellent written and verbal communication: able to explain novel methods clearly to wet-lab scientists, clinicians, and fellow statisticians alike
  • A genuine desire to impact patient lives and contribute to the broader scientific community

Apply for this position