Machine Learning and AI Engineer

Narwhal Labs

Bristol, United Kingdom

1 month ago

Role details

Contract type

Internship / Graduate position

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Compensation

£ 100K

Job location

Bristol, United Kingdom

Tech stack

API

Artificial Intelligence

Code Review

Data Cleansing

Data Files

Data Mining

Information Extraction

Python

Machine Learning

Natural Language Processing

Open Source Technology

Speech Recognition

Jupyter Notebook

Feature Engineering

Large Language Models

Prompt Engineering

GIT

AI Platforms

Production Code

Speech Synthesis

Api Design

Document Classification

Job description

Build and maintain ML/AI features

Develop and improve the components that make AI agents intelligent: prompt engineering, classifier pipelines, goal evaluation logic, and post-call analysis
Work with the LangChain/LangGraph agent framework to build, test, and refine conversation flows that handle real-world customer interactions
Implement and evaluate data extraction pipelines - turning unstructured conversation transcripts into structured fields (names, dates, postcodes, appointment preferences) reliably

Integrate and evaluate models

Integrate LLM providers (OpenAI, Anthropic, Groq, Google) into the platform's agent orchestration layer, including prompt construction, response parsing, and error handling
Run model evaluations - comparing output quality, latency, and cost across providers and model versions to inform which models the platform uses in production
Work with the existing Langsmith tracing infrastructure to monitor model performance and identify regressions

Support the voice and classification pipeline

Contribute to the STT (speech-to-text) and TTS (text-to-speech) integration layer - understanding how audio becomes text, how text becomes an agent response, and how that response becomes audio again
Help build and extend the classification system that determines conversation outcomes (was the call successful? did the customer want a callback? was it a voicemail?) - including writing evaluation prompts, defining ground truth datasets, and measuring accuracy
Assist with data preparation, feature engineering, and dataset curation for evaluation and fine-tuning tasks

Write production-quality code

Write clean, tested Python that runs in a production FastAPI application - not throwaway scripts
Participate in code reviews, both giving and receiving - learning from the senior developer's feedback and contributing your own perspective
Contribute to documentation that helps the rest of the engineering team understand how AI components work and how to use them correctly, * You've worked with LangChain, LangGraph, or similar agent frameworks - even in a personal project or hackathon
You've built something with the OpenAI or Anthropic API that went beyond "hello world" - a chatbot, a classifier, a data extraction pipeline, an evaluation harness
You understand the basics of how voice AI works: STT * LLM * TTS - even if you've only read about it rather than built it
You've worked with structured evaluation of LLM outputs - comparing model responses against expected answers, not just eyeballing whether it "looks right"
You have opinions about prompt engineering - you've iterated on prompts and observed how small changes affect output quality

What You Won't Be Doing

Working in isolation on research problems - this is a product engineering role embedded in a delivery team
Training large models from scratch - the platform uses hosted LLM APIs; your job is integration, evaluation, and orchestration, not pretraining
Waiting to be told what to do - you'll have guidance and mentorship from the senior developer, but you're expected to take ownership of your tasks and ask questions when you're stuck

Requirements

Do you have experience in Python?, The platform runs on a practical AI stack: LangChain and LangGraph for agent orchestration, OpenAI and Anthropic for LLMs, Deepgram for speech-to-text, ElevenLabs for text-to-speech, and LiveKit for real-time voice infrastructure. You don't need to know all of these coming in, but you do need to be comfortable working with APIs, understanding model behaviour, and writing Python that runs in production - not just in notebooks., You've finished your degree or equivalent, and you've spent some time - whether through jobs, internships, or serious personal projects - working with ML or AI in a way that went beyond coursework.

1-2 years of experience working with ML/AI (including internships, placement years, or substantial personal/open-source projects)
Solid Python skills - you can write functions, classes, and tests confidently, not just Jupyter notebooks
Familiarity with at least some of: LLMs and prompt engineering, NLP, text classification, or information extraction - you don't need depth in all of them, but you need to have worked with at least one area hands-on
Basic understanding of how ML models are evaluated - you know what precision, recall, and F1 mean and why they matter; you've compared model outputs against ground truth at least once
Comfortable working with APIs and reading documentation - a significant part of this role involves integrating and configuring third-party AI services, not building models from scratch
Familiar with Git and working in a team codebase - you've committed code that other people have reviewed, and you've reviewed other people's code

Benefits & conditions

We're building something global at Narwhal, and we mean that in every sense. The work we do requires different ways of thinking - and different ways of thinking come from different people.

At Narwhal, we're committed to building a diverse and inclusive team. We welcome applications from people of all backgrounds, identities, and experiences, and we actively work to ensure our hiring process is fair and accessible for everyone. Reasonable adjustments are available at every stage, just reach out and we'll make it happen.

Pay: £75,000.00-£100,000.00 per year

About the company

Narwhal Labs is the company behind DeepBlue OS - an autonomous revenue infrastructure platform that enables any business to answer every call, follow up every lead, and log every interaction across Voice, SMS, Email and WhatsApp. As an NVIDIA Inception Program Member and Google Partner, we are a 38-person team with our platform launching in May 2026. We build the infrastructure layer for serious businesses that want enterprise-grade revenue operations at a fraction of traditional cost.

Role details

Job location

Tech stack

Job description

Requirements

Benefits & conditions

About the company

Apply for this position

Good distractions

Moments

Videos View all