Machine Learning and AI Engineer

Narwhal Labs
Bristol, United Kingdom
3 days ago

Role details

Contract type
Internship / Graduate position
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
£ 100K

Job location

Bristol, United Kingdom

Tech stack

API
Artificial Intelligence
Code Review
Data Cleansing
Data Files
Data Mining
Information Extraction
Python
Machine Learning
Natural Language Processing
Open Source Technology
Speech Recognition
Jupyter Notebook
Feature Engineering
Large Language Models
Prompt Engineering
GIT
AI Platforms
Production Code
Speech Synthesis
Api Design
Document Classification

Job description

Build and maintain ML/AI features

  • Develop and improve the components that make AI agents intelligent: prompt engineering, classifier pipelines, goal evaluation logic, and post-call analysis
  • Work with the LangChain/LangGraph agent framework to build, test, and refine conversation flows that handle real-world customer interactions
  • Implement and evaluate data extraction pipelines - turning unstructured conversation transcripts into structured fields (names, dates, postcodes, appointment preferences) reliably

Integrate and evaluate models

  • Integrate LLM providers (OpenAI, Anthropic, Groq, Google) into the platform's agent orchestration layer, including prompt construction, response parsing, and error handling
  • Run model evaluations - comparing output quality, latency, and cost across providers and model versions to inform which models the platform uses in production
  • Work with the existing Langsmith tracing infrastructure to monitor model performance and identify regressions

Support the voice and classification pipeline

  • Contribute to the STT (speech-to-text) and TTS (text-to-speech) integration layer - understanding how audio becomes text, how text becomes an agent response, and how that response becomes audio again
  • Help build and extend the classification system that determines conversation outcomes (was the call successful? did the customer want a callback? was it a voicemail?) - including writing evaluation prompts, defining ground truth datasets, and measuring accuracy
  • Assist with data preparation, feature engineering, and dataset curation for evaluation and fine-tuning tasks

Write production-quality code

  • Write clean, tested Python that runs in a production FastAPI application - not throwaway scripts
  • Participate in code reviews, both giving and receiving - learning from the senior developer's feedback and contributing your own perspective
  • Contribute to documentation that helps the rest of the engineering team understand how AI components work and how to use them correctly, * You've worked with LangChain, LangGraph, or similar agent frameworks - even in a personal project or hackathon
  • You've built something with the OpenAI or Anthropic API that went beyond "hello world" - a chatbot, a classifier, a data extraction pipeline, an evaluation harness
  • You understand the basics of how voice AI works: STT * LLM * TTS - even if you've only read about it rather than built it
  • You've worked with structured evaluation of LLM outputs - comparing model responses against expected answers, not just eyeballing whether it "looks right"
  • You have opinions about prompt engineering - you've iterated on prompts and observed how small changes affect output quality

What You Won't Be Doing

  • Working in isolation on research problems - this is a product engineering role embedded in a delivery team
  • Training large models from scratch - the platform uses hosted LLM APIs; your job is integration, evaluation, and orchestration, not pretraining
  • Waiting to be told what to do - you'll have guidance and mentorship from the senior developer, but you're expected to take ownership of your tasks and ask questions when you're stuck

Requirements

Do you have experience in Python?, The platform runs on a practical AI stack: LangChain and LangGraph for agent orchestration, OpenAI and Anthropic for LLMs, Deepgram for speech-to-text, ElevenLabs for text-to-speech, and LiveKit for real-time voice infrastructure. You don't need to know all of these coming in, but you do need to be comfortable working with APIs, understanding model behaviour, and writing Python that runs in production - not just in notebooks., You've finished your degree or equivalent, and you've spent some time - whether through jobs, internships, or serious personal projects - working with ML or AI in a way that went beyond coursework.

  • 1-2 years of experience working with ML/AI (including internships, placement years, or substantial personal/open-source projects)
  • Solid Python skills - you can write functions, classes, and tests confidently, not just Jupyter notebooks
  • Familiarity with at least some of: LLMs and prompt engineering, NLP, text classification, or information extraction - you don't need depth in all of them, but you need to have worked with at least one area hands-on
  • Basic understanding of how ML models are evaluated - you know what precision, recall, and F1 mean and why they matter; you've compared model outputs against ground truth at least once
  • Comfortable working with APIs and reading documentation - a significant part of this role involves integrating and configuring third-party AI services, not building models from scratch
  • Familiar with Git and working in a team codebase - you've committed code that other people have reviewed, and you've reviewed other people's code

Benefits & conditions

We're building something global at Narwhal, and we mean that in every sense. The work we do requires different ways of thinking - and different ways of thinking come from different people.

At Narwhal, we're committed to building a diverse and inclusive team. We welcome applications from people of all backgrounds, identities, and experiences, and we actively work to ensure our hiring process is fair and accessible for everyone. Reasonable adjustments are available at every stage, just reach out and we'll make it happen.

Pay: £75,000.00-£100,000.00 per year

About the company

Narwhal Labs is the company behind DeepBlue OS - an autonomous revenue infrastructure platform that enables any business to answer every call, follow up every lead, and log every interaction across Voice, SMS, Email and WhatsApp. As an NVIDIA Inception Program Member and Google Partner, we are a 38-person team with our platform launching in May 2026. We build the infrastructure layer for serious businesses that want enterprise-grade revenue operations at a fraction of traditional cost.

Apply for this position