R&D Machine Learning Engineer (Speech and Voice) (0-3 Years Experience).

Neweasy
Charing Cross, United Kingdom
15 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Junior
Compensation
£ 75K

Job location

Charing Cross, United Kingdom

Tech stack

Artificial Intelligence
Python
Machine Learning
TensorFlow
Speech Recognition
Chatbots
PyTorch
Large Language Models
Prompt Engineering
Deep Learning
HuggingFace
Speech Synthesis
Feature Extraction
Data Pipelines

Job description

We're an early-stage AI startup building next-generation speech and voice technologies - from intelligent voice agents and conversational systems to adaptive audio-driven AI products. Our work sits at the intersection of machine learning research and real-world deployment, and we're looking for curious, ambitious engineers to help push the boundaries. What You'll Be Doing

  • Researching, training, and fine-tuning speech and audio models, including ASR, TTS, speaker recognition, and voice interaction systems
  • Building and optimising speech-to-text, text-to-speech, and conversational AI pipelines, integrating LLMs where appropriate
  • Designing and maintaining audio data pipelines for collection, preprocessing, augmentation, and evaluation
  • Experimenting with multimodal models that combine speech, text, and contextual signals
  • Exploring prompt engineering, RAG, and memory architectures for voice-driven AI systems
  • Collaborating with engineers to deploy models into low-latency, production environments
  • Developing internal tools for model monitoring, evaluation, and continuous improvement
  • Staying close to current research in speech, audio, and conversational AI, with time and support to explore and publish

Requirements

London / Hybrid | 0-3 Years Experience

Are you fascinated by human speech, voice, and how machines understand, generate, and interact through sound?, * 0-3 years of experience in Machine Learning, AI, Speech Processing, or Applied Research

  • Strong Python skills and hands-on ML experience
  • Experience with PyTorch, TensorFlow, or Hugging Face
  • Solid understanding of deep learning fundamentals, particularly for sequence and audio models
  • Familiarity with, or strong interest in:
  • Automatic Speech Recognition (ASR)
  • Text-to-Speech (TTS)
  • Speaker diarisation and speaker identification
  • Audio feature extraction (MFCCs, spectrograms, embeddings)
  • Transformers, sequence models, and multimodal architectures
  • Curiosity, strong problem-solving skills, and comfort working in a fast-moving startup environment

About the company

* Hands-on mentorship from senior ML engineers, AI researchers, and founders * Freedom to experiment with state-of-the-art speech and voice models * A modern ML stack including Python, PyTorch, Hugging Face, OpenAI APIs, vector databases, and cloud infrastructure * Flexible working with a hybrid model and regular in-person collaboration * Accelerated career growth through ownership of real R&D and production systems * A culture that values learning, technical depth, and impact over bureaucracy Perfect For * Graduates or junior engineers with a strong interest in speech, voice, or audio ML * Researchers wanting to see their work deployed in real products * Engineers excited by applied R&D, real-time systems, and human-AI interaction Machine Learning Engineer, Speech Recognition, Voice AI, Audio ML, Automatic Speech Recognition, ASR, Text-to-Speech, TTS, Speaker Recognition, Conversational AI, Multimodal AI, Deep Learning, Transformers, NLP, LLMs, Generative AI, Python, PyTorch, TensorFlow, Hugging Face, RAG, MLOps, Data Pipelines, Model Deployment, AI Research, Applied AI, AI Systems, AI R&D, Speech Technology, Voice Technology, AI Startup, Early-Stage Startup.

Apply for this position