Machine Learning Infrastructure Engineer - #4694
Role details
Job location
Tech stack
Job description
This is a hybrid role based in Menlo Park, CA (moving to Sunnyvale, CA in Fall 2026). Our current hybrid policy requires on-site presence at least 40% of the time, including key in-person collaboration days. At our Menlo Park campus, Tuesdays and Thursdays are the key days where we encourage on-site presence to engage in events and on-site activities. Responsibilities
- Partner with research teams to identify computational pain points or limitations in performing computational experiments and analyses.
- Design, build, and evolve software which usefully extends research capabilities, including infrastructure for distributed ML training and evaluation on large controlled genomic datasets.
- Develop tools and processes that ensure GxP-compliant testing, patchability, and inference reproducibility for classifiers that are promoted to production use.
- Develop and maintain the research team's software environment, including tools to assess the health, performance, and cost of the system.
These summarize the role's primary responsibilities and are not an exhaustive list. They may change at the company's discretion.
Requirements
The ideal candidate will bring a passion for reliable software infrastructure, distributed computing, reproducible research, and general problem-solving. Due to the highly connected nature of this position, the candidate should be a strong communicator with experience working with multidisciplinary teams., * 5+ years of experience developing software supporting machine learning, scientific computing, or large-scale data processing systems
- Strong programming skills in Python and a systems-level language such as Golang (preferred), Java, C#, C++, etc.
- Experience working with modern machine learning frameworks such as PyTorch or TensorFlow
- Experience with Distributed Computing paradigms (Spark, Ray, Flink, Beam, etc.)
- A commitment to high-quality professionally engineered software
- Strong communication skills with the ability to help developers from a wide range of software development backgrounds
- BS in Computer Science, Engineering, Bioinformatics, or a related field, or equivalent practical experience
Preferred Qualifications
- Good understanding of container orchestration through Docker and cloud technologies.
- Experience with scientific computing tools: NumPy, Jupyter, R Notebook, etc.
- Experience with techniques used in modern AI (including LLM) training
- Experience with whole genome sequencing, whole exome sequencing, bisulfite sequencing, and/or whole transcriptome sequencing data
- Practical experience setting up continuous integration systems, along with expertise in at least one build tool (e.g. Bazel (preferred), Buck, Maven, Gradle)
- Familiarity with AWS services, best practices, and security
- Advanced degree (MS or PhD) in computer science, engineering, bioinformatics or a related discipline
Benefits & conditions
The expected, full-time, annual base pay scale for this position is $190k-$255k.
This role may be eligible for other forms of compensation, including an annual bonus and/or incentives, subject to the terms of the applicable plans and Company discretion. This range reflects a good-faith estimate of the range that the Company reasonably expects to pay for the position upon hire; the actual compensation offered may vary depending on factors such as the candidate's qualifications. Employees in this role are also eligible for GRAIL's comprehensive and competitive benefits package, offered in accordance with our applicable plans and policies. This package currently includes flexible time-off or vacation; a 401(k) retirement plan with employer match; medical, dental, and vision coverage; and carefully selected mindfulness programs.