R&D Machine Learning Engineer (Speech and Voice) (0-3 Years Experience).

Neweasy

Charing Cross, United Kingdom

15 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Junior

Compensation

£ 75K

Job location

Charing Cross, United Kingdom

Tech stack

Artificial Intelligence

Python

Machine Learning

TensorFlow

Speech Recognition

Chatbots

PyTorch

Large Language Models

Prompt Engineering

Deep Learning

HuggingFace

Speech Synthesis

Feature Extraction

Data Pipelines

Job description

We're an early-stage AI startup building next-generation speech and voice technologies - from intelligent voice agents and conversational systems to adaptive audio-driven AI products. Our work sits at the intersection of machine learning research and real-world deployment, and we're looking for curious, ambitious engineers to help push the boundaries. What You'll Be Doing

Researching, training, and fine-tuning speech and audio models, including ASR, TTS, speaker recognition, and voice interaction systems
Building and optimising speech-to-text, text-to-speech, and conversational AI pipelines, integrating LLMs where appropriate
Designing and maintaining audio data pipelines for collection, preprocessing, augmentation, and evaluation
Experimenting with multimodal models that combine speech, text, and contextual signals
Exploring prompt engineering, RAG, and memory architectures for voice-driven AI systems
Collaborating with engineers to deploy models into low-latency, production environments
Developing internal tools for model monitoring, evaluation, and continuous improvement
Staying close to current research in speech, audio, and conversational AI, with time and support to explore and publish

Requirements

London / Hybrid | 0-3 Years Experience

Are you fascinated by human speech, voice, and how machines understand, generate, and interact through sound?, * 0-3 years of experience in Machine Learning, AI, Speech Processing, or Applied Research

Strong Python skills and hands-on ML experience
Experience with PyTorch, TensorFlow, or Hugging Face
Solid understanding of deep learning fundamentals, particularly for sequence and audio models
Familiarity with, or strong interest in:
Automatic Speech Recognition (ASR)
Text-to-Speech (TTS)
Speaker diarisation and speaker identification
Audio feature extraction (MFCCs, spectrograms, embeddings)
Transformers, sequence models, and multimodal architectures
Curiosity, strong problem-solving skills, and comfort working in a fast-moving startup environment

About the company

* Hands-on mentorship from senior ML engineers, AI researchers, and founders * Freedom to experiment with state-of-the-art speech and voice models * A modern ML stack including Python, PyTorch, Hugging Face, OpenAI APIs, vector databases, and cloud infrastructure * Flexible working with a hybrid model and regular in-person collaboration * Accelerated career growth through ownership of real R&D and production systems * A culture that values learning, technical depth, and impact over bureaucracy Perfect For * Graduates or junior engineers with a strong interest in speech, voice, or audio ML * Researchers wanting to see their work deployed in real products * Engineers excited by applied R&D, real-time systems, and human-AI interaction Machine Learning Engineer, Speech Recognition, Voice AI, Audio ML, Automatic Speech Recognition, ASR, Text-to-Speech, TTS, Speaker Recognition, Conversational AI, Multimodal AI, Deep Learning, Transformers, NLP, LLMs, Generative AI, Python, PyTorch, TensorFlow, Hugging Face, RAG, MLOps, Data Pipelines, Model Deployment, AI Research, Applied AI, AI Systems, AI R&D, Speech Technology, Voice Technology, AI Startup, Early-Stage Startup.