Senior Machine Learning Engineer
Role details
Job location
Tech stack
Job description
As a Senior Machine Learning Engineer, you will be building a platform that accelerates drug discovery with machine learning for a wide variety of use cases with respect to data modality, model architecture, and model size. By leveraging your experience in machine learning, software engineering, infrastructure, and data, you will:
- Build our machine learning platform from a mixture of off-the-shelf and custom tooling as needed to enable the full model lifecycle from development through deployment
- Design and implement scalable LLM serving infrastructure to support high-throughput inference and real-time model serving capabilities
- Architect infrastructure for agentic systems, including multi-agent coordination, workflow orchestration, and autonomous decision-making pipelines
- Partner with ML teams actively developing models to onboard them to platform capabilities
- Continuously optimize the platform to support our emerging machine learning research capabilities and evolving use case needs
- Build and maintain orchestration frameworks for complex ML workflows, including agent-based systems and multi-modal model pipelines
- Inspire your team and stakeholders alike to find the best outcome by facilitating constructive dialogue, reconciling perspectives into a unified view
- Create tools and experiences that support the development of models -- however you won't be responsible for building machine learning models, * A member of the ML Infra team will be your trail guide to ensure smooth onboarding
- You will be encouraged to attend one conference of your choice per year paid for by Recursion in support of our value of 'We Learn'
- You will be provided opportunities to share your work and teach others new skills through weekly initiatives like Wednesday Wins and Tech Talks
- You will have the option to participate in coaching like BetterUp for leadership skills and development programs
- Our team will get together in person once a year (usually in Salt Lake) to build alignment and motivation as well as spend quality time with one another
Requirements
-
Experience designing and building large distributed systems with elegant interfaces that can scale easily
-
Proven experience with LLM serving infrastructure, including optimization techniques for large-scale model inference, batching strategies, and resource management
-
Expertise in building infrastructure for agentic systems, including multi-agent architectures, coordination protocols, and autonomous workflow management
-
Strong background in orchestration frameworks and workflow management systems for complex ML pipelines and agent-based workflows
-
Proficiency with the full lifecycle of ML and software development in production, including:
-
Data preparation, model training, evaluation, deployment, and monitoring
-
Releasing, and maintaining mature ML products
-
Authoring well tested, scalable, documented code, peer reviews, CI/CD
-
Experience with model optimization, quantization, and serving frameworks for efficient LLM deployment
-
Knowledge of distributed agent coordination, task scheduling, and inter-agent communication patterns
-
Ability to own projects in a collaborative open environment
-
Ability to be a mentor, coach, and sponsor to peers and colleagues
Nice to haves: PyTorch, GCP, CUDA, Docker, Kubernetes, BigQuery, large scale distributed systems, vLLM, Ray, Prefect, LangChain, model quantization techniques, distributed inference frameworks, agent orchestration platforms, workflow management systems
Benefits & conditions
At Recursion, we believe that every employee should be compensated fairly. Based on the skill and level of experience required for this role, the estimated current annual base range for this role is £75,900 to £101,900. You will also be eligible for an annual bonus and equity compensation, as well as a comprehensive benefits package.