Senior Engineering Manager (ML Infrastructure), London
Role details
Job location
Tech stack
Job description
Reporting directly to the Head of Engineering, you will be responsible for the strategic direction, execution, and operational excellence of our Platform teams, which are enabling training and serving of the largest foundations models in biotech. These teams include Technical Infrastructure, Machine Learning Platform, Site Reliability team and Developer Experience for Research. This is a highly influential role that requires a blend of deep technical expertise, strong leadership capabilities, and a passion for fostering a high-performance engineering culture. Your work has the potential to accelerate cutting edge research in the drug design space. What you will do
- Define and execute the long-term vision and strategy for our foundational infrastructure, aligning with company goals and anticipating future needs.
- Lead, mentor, and inspire a diverse team of engineering managers and individual contributors across multiple disciplines (ML Platform, Tech Infrastructure, SRE). Foster a culture of innovation, collaboration, continuous learning, and accountability.
- Oversee the development and evolution of our core platform, ensuring it provides robust, scalable, and developer-friendly services for all engineering teams. This includes aspects like service mesh, container orchestration, CI/CD pipelines, and internal tooling.
- Lead the teams responsible for building and maintaining the underlying systems that power our user-facing applications, focusing on performance, reliability, and seamless user experiences.
- Guide the development and operation of the infrastructure supporting our prediction models, ensuring high availability, low latency, and efficient data processing for machine learning initiatives.
- Drive the strategy and execution for our core technical infrastructure, including networking, compute, storage, and data centers (on-prem and/or cloud). Optimize for cost, performance, and security.
- Champion and embed SRE principles across the organization. Oversee the SRE team responsible for ensuring the reliability, scalability, and performance of our critical systems through proactive monitoring, incident management, and automation.
- Establish and enforce best practices for infrastructure operations, including monitoring, alerting, capacity planning, disaster recovery, and security. Drive continuous improvement in system stability and uptime.
- Partner closely with product engineering teams, security, data science, and other stakeholders to understand their needs and deliver foundational solutions that accelerate product development and innovation.
- Evaluate and manage relationships with key internal partners to ensure optimal value and performance.
- Manage the budget for the Foundations Engineering organization, optimizing resource allocation and identifying cost-saving opportunities.
Requirements
- Experience in software engineering, with a significant portion focused on ML infrastructure, platform, or site reliability engineering.
- Demonstrated experience in a leadership role, managing multiple engineering teams and managers.
- Deep understanding of distributed systems, cloud architectures (AWS, Azure, GCP), and modern infrastructure technologies.
- Proven experience with running horizontal Platform Engineering teams
- Strong background in Site Reliability Engineering (SRE) principles and practices, including incident management, observability, performance optimization, and automation.
- Familiarity with data infrastructure and systems supporting machine learning/prediction models.
- Excellent communication, interpersonal, and presentation skills, with the ability to articulate complex technical concepts to both technical and non-technical audiences.
- Demonstrated ability to attract, hire, retain, and develop top engineering talent.
- Strategic thinker with a proven ability to define and execute complex technical roadmaps.
- Strong problem-solving skills and a data-driven approach to decision-making.
- Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
Nice to have:
- Experience in a high-growth, fast-paced environment.
- Experience building and scaling infrastructure for Biotech, Life science ADD
Culture and values