AI/ML Engineer

Improbable

Charing Cross, United Kingdom

1 month ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Charing Cross, United Kingdom

Tech stack

Training Data

API

Artificial Intelligence

Amazon Web Services (AWS)

Systems Engineering

Profiling

Software Quality

Code Review

Continuous Integration

Software Debugging

Software Design Patterns

Distributed Computing Environment

Fault Tolerance

Graph Database

Python

Machine Learning

Neo4j

Operational Data Store

Operational Databases

Performance Tuning

Software Deployment

Data Streaming

Systems Integration

Data Logging

Data Processing

Graphics Processing Unit (GPU)

Data Ingestion

PyTorch

Delivery Pipeline

Large Language Models

Multi-Agent Systems

Prompt Engineering

Backend

FastAPI

Kubernetes

Machine Learning Operations

GPT

Data Pipelines

Docker

Job description

You'll work across our entire AI stack: building FastAPI services that serve models, creating training pipelines that process production data, deploying inference endpoints with proper monitoring, and integrating all of this into our existing Python backend. The ML is important, but the engineering discipline is what makes it production-ready., * Build production AI systems: Design and implement the full stack, from FastAPI endpoints that handle requests, to training pipelines that process data, to inference services that serve predictions. You'll own the architecture, not just the model weights.

Train and deploy our DSLM: Fine-tune models using Unsloth/Axolotl, but more importantly, build the robust infrastructure around it - data pipelines that feed training, evaluation frameworks that catch regressions, deployment systems that handle failover. Make it production-grade.
Integrate ML into our backend: We use FastAPI, PydanticAI, FastMCP, Memgraph. You'll extend these systems with ML capabilities, not as a separate "ML service" but as a natural part of our backend architecture. Clean abstractions, proper error handling, observability.
Own inference performance: Get models running fast, whether that's vLLM deployment, quantization strategies, batching optimizations, or caching. Hit our <200ms latency targets through engineering, not just throwing bigger GPUs at it.
Shape Project Genome's foundation: Work with our Principal Engineer to architect how we ingest, process, and learn from global supply chain data. This is systems design as much as ML with data pipelines, graph databases, incremental learning strategies being just as important.
Mentor through code review and pairing: Raise the bar on code quality, testing, and production practices across the team. Teach mid and junior engineers how to build ML systems that don't fall over., * You're a strong production Python engineer: You write clean, maintainable, tested code. You understand async/await, know when to use generators vs lists, can profile performance bottlenecks. You've built FastAPI services (or similar) that handle production traffic. Your code passes review without drama.
You've built with LLMs in production: You've integrated GPT-4/Claude into real applications, handled streaming responses, dealt with rate limits and retries, cached intelligently. You know the practical challenges: prompt engineering, context management, error handling, cost control.
You've trained or fine-tuned models: Whether it's fine-tuning LLMs, training classifiers, or running experiments, you understand the workflow. You've dealt with training data quality, evaluation metrics, and overfitting. You can debug why a model isn't learning what you expected.
You think like a systems engineer: You design for failure, add instrumentation, consider edge cases. You know that "the model works on my laptop" isn't shipping. You care about monitoring, logging, alerting, and graceful degradation.
You can navigate the ML landscape pragmatically: You know enough about transformers, attention mechanisms, and training dynamics to make informed decisions. But you're not precious about it. If a simple heuristic beats a complex model, you ship the heuristic.
You balance velocity with quality: You ship incrementally and iterate based on production data. But you don't accumulate tech debt, you refactor proactively, write tests that matter, and leave the codebase better than you found it.
You communicate trade-offs clearly: You can explain to the team why we're choosing LoRA over full fine-tuning, why we're deploying on Fireworks instead of self-hosting, or why a 7B model might beat a 70B model. You help everyone make informed decisions., * ML Stack: PyTorch, Unsloth/Axolotl for training, vLLM for inference, Weights & Biases
Models: Qwen 2.5, Llama 3.1, GPT-4, Claude (for now)
Infrastructure: AWS (flexible), Docker, Kubernetes, GPUs when needed
Team: Principal Engineer (your partner on architecture), Mid Data/ML Engineer (your data pipeline partner), Junior AI Engineer (your mentee)

Example projects you'll own

Build a FastAPI service that handles streaming LLM responses with correct error handling and retry logic
Create a training pipeline that processes production logs, validates data quality, and triggers fine-tuning runs
Deploy a fine-tuned 7B model with vLLM that beats GPT-4 latency while maintaining quality on our domain
Design the data ingestion architecture for Project Genome, how we process papers, documentation, and operational data at scale
Implement evaluation frameworks that catch model regressions before they reach production

Requirements

Do you have experience in Supply chain?, Must have:

5+ years building production Python systems (backend services, APIs, data processing)
Strong software engineering fundamentals: design patterns, testing, debugging, profiling
Experience integrating LLMs into applications (OpenAI/Anthropic APIs, prompt engineering, streaming, PydanticAI)
Understanding of ML training workflows (even if you're not an expert. You need to know enough to build the infrastructure)
Docker, CI/CD, production deployment experience
Can read and understand PyTorch code (you don't need to write novel architectures)

Nice to have:

Fine-tuning experience (LoRA, full fine-tuning, QLoRA)
Distributed training basics (DeepSpeed, FSDP)
Graph databases (Memgraph, Neo4j)
Supply chain or logistics domain knowledge
Experience with agent frameworks (LangChain, PydanticAI, etc.)

About the company

At Kallikor, we're building the future of supply chain intelligence through AI-powered simulation digital twins. We create living digital representations of real-world operations (warehouses, distribution networks, global logistics) that help organisations make better decisions faster. We're at an inflection point: moving from AI-assisted tools to domain-specific AI that understands supply chains as deeply as our best engineers do. You'll be instrumental in building our first domain-specific language model (DSLM) and the foundation for Project Genome, an ambitious initiative to capture and synthesise the world's supply chain knowledge into actionable intelligence., Kallikor is determined to foster an environment where people can do their best work and feel like they belong. We believe a healthy culture, strong values and contribution from a diverse range of individuals will help us to achieve success.

Role details

Job location

Tech stack

Job description

Requirements

About the company

Apply for this position

Good distractions

Moments

Videos View all