Data Scientist & Machine Learning Researcher
Role details
Job location
Tech stack
Job description
Modern machine learning can predict vocal tract shapes from audio recordings of the voice with remarkable accuracy, but most of these models are black boxes. This Royal Society-funded project aims to crack open the black box and solve one of the most compelling challenges in speech science: understanding the mapping between vocal tract movements and the acoustic speech signal. Using state-of-the-art MRI recordings of the vocal tract during speech, we aim to develop machine learning approaches that don't just predict acoustic output from articulatory configurations, but reveal why and how these mappings work. We need approaches that combine predictive power with scientific insight: models whose internal representations align with phonetic and physical knowledge. This requires hybrid machine learning (ML) approaches that integrate domain knowledge with data-driven learning, as well as explainable AI (xAI) techniques that make model behaviour transparent and scientifically meaningful.
You will apply these approaches to a large database of real-time MRI and acoustic recordings of the vocal tract. Solving this problem will help to drive fundamental progress on critical applications, such as articulatory biofeedback for language learning and speech therapy.
Your RoleWorking with Dr Sam Kirkham (Lancaster, Speech Science), Dr Anton Ragni (Sheffield, Computer Science) and Professor Aneta Stefanovska (Lancaster, Physics) you'll develop and validate interpretable ML approaches for modelling acoustic-articulatory relations using MRI vocal tract data. The position is available for 18 months from 1 July 2026 (start date negotiable).
Key Objectives
Develop hybrid ML architectures that incorporate phonetic and physical constraints.
Apply and extend explainable AI techniques for speech production modelling.
Validate model interpretations against established knowledge.
Lead and/or contribute to publications at the intersection of speech science and machine learning.
This is a methodologically creative role with genuine intellectual ownership. You'll have access to rich MRI datasets and Lancaster's high-performance computing facilities.
Requirements
PhD in Speech Processing, Computational Linguistics, Machine Learning, Computer Science, or related field (PhD must have been submitted by start date).
Strong experience with machine learning for time-series data.
Excellent Python skills with PyTorch.
Ability to work independently on complex, open-ended problems.
Effective communication skills for interdisciplinary collaboration.
Demonstrated interest in interpretable ML, explainable AI, or hybrid approaches.
Desirable
Experience with speech/audio processing or articulatory data.
Knowledge of physics-informed neural networks, neural ODEs, or other hybrid architectures.
Knowledge of speech production, speech science, or biomechanics.
Publications in ML, speech technology, or computational linguistics., Senior Data Scientist & Machine Learning Researcher Location: Gloucester, London or Manchester Hybrid role: Must be prepared to work from a Raytheon or customer site depending on demand. Average of 3 days a week on-site. SC Required - Must already hold or be able to gain...
Benefits & conditions
OverviewJob title: Senior Machine Learning Research Engineer Salary: £64,490 - £86,255 Location: Cambridge - Triangle/Hybrid (2 days per week in the office) Contract: Permanent Hours: Full Time (35 hours per week) Shape the future of AI-powered learning solutions with..., Salary range: £37694 - £46049FTE: I (35 hours per week)Contract type: Fixed Term (24 months)Closing date: 02/03/2026 Department of Electronic & Electrical EngineeringThe advertised position will be primarily focused on the Frontiers in Electromagnetic Non-Destructive...