R&D Machine Learning Engineer (Speech and Voice) (0-3 Years Experience).
Neweasy
Charing Cross, United Kingdom
15 days ago
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
Junior Compensation
£ 75KJob location
Charing Cross, United Kingdom
Tech stack
Artificial Intelligence
Python
Machine Learning
TensorFlow
Speech Recognition
Chatbots
PyTorch
Large Language Models
Prompt Engineering
Deep Learning
HuggingFace
Speech Synthesis
Feature Extraction
Data Pipelines
Job description
We're an early-stage AI startup building next-generation speech and voice technologies - from intelligent voice agents and conversational systems to adaptive audio-driven AI products. Our work sits at the intersection of machine learning research and real-world deployment, and we're looking for curious, ambitious engineers to help push the boundaries. What You'll Be Doing
- Researching, training, and fine-tuning speech and audio models, including ASR, TTS, speaker recognition, and voice interaction systems
- Building and optimising speech-to-text, text-to-speech, and conversational AI pipelines, integrating LLMs where appropriate
- Designing and maintaining audio data pipelines for collection, preprocessing, augmentation, and evaluation
- Experimenting with multimodal models that combine speech, text, and contextual signals
- Exploring prompt engineering, RAG, and memory architectures for voice-driven AI systems
- Collaborating with engineers to deploy models into low-latency, production environments
- Developing internal tools for model monitoring, evaluation, and continuous improvement
- Staying close to current research in speech, audio, and conversational AI, with time and support to explore and publish
Requirements
London / Hybrid | 0-3 Years Experience
Are you fascinated by human speech, voice, and how machines understand, generate, and interact through sound?, * 0-3 years of experience in Machine Learning, AI, Speech Processing, or Applied Research
- Strong Python skills and hands-on ML experience
- Experience with PyTorch, TensorFlow, or Hugging Face
- Solid understanding of deep learning fundamentals, particularly for sequence and audio models
- Familiarity with, or strong interest in:
- Automatic Speech Recognition (ASR)
- Text-to-Speech (TTS)
- Speaker diarisation and speaker identification
- Audio feature extraction (MFCCs, spectrograms, embeddings)
- Transformers, sequence models, and multimodal architectures
- Curiosity, strong problem-solving skills, and comfort working in a fast-moving startup environment
About the company
* Hands-on mentorship from senior ML engineers, AI researchers, and founders
* Freedom to experiment with state-of-the-art speech and voice models
* A modern ML stack including Python, PyTorch, Hugging Face, OpenAI APIs, vector databases, and cloud infrastructure
* Flexible working with a hybrid model and regular in-person collaboration
* Accelerated career growth through ownership of real R&D and production systems
* A culture that values learning, technical depth, and impact over bureaucracy
Perfect For
* Graduates or junior engineers with a strong interest in speech, voice, or audio ML
* Researchers wanting to see their work deployed in real products
* Engineers excited by applied R&D, real-time systems, and human-AI interaction
Machine Learning Engineer, Speech Recognition, Voice AI, Audio ML, Automatic Speech Recognition, ASR, Text-to-Speech, TTS, Speaker Recognition, Conversational AI, Multimodal AI, Deep Learning, Transformers, NLP, LLMs, Generative AI, Python, PyTorch, TensorFlow, Hugging Face, RAG, MLOps, Data Pipelines, Model Deployment, AI Research, Applied AI, AI Systems, AI R&D, Speech Technology, Voice Technology, AI Startup, Early-Stage Startup.