Senior Software Engineer - AI Platform Engineering

Dojo

25 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Compensation

£ 85K

Job location

Tech stack

Java

.NET

API

Artificial Intelligence

Cloud Computing

Cloud Engineering

Databases

Continuous Integration

Distributed Systems

Python

Open Source Technology

Prometheus

Software Engineering

Data Logging

Delivery Pipeline

Large Language Models

Grafana

Build Management

Kubernetes

Machine Learning Operations

Terraform

Microservices

Job description

As an AI Platform Engineer, you'll be part of the team designing and running the core AI platform. You'll work on APIs, pipelines, observability, security, and orchestration, ensuring our AI solutions can move from experiment to production smoothly. This role is about building the foundations of AI adoption - if you enjoy combining distributed systems, cloud engineering, and AI tooling into something bigger, this is it.

We are looking for someone who takes pride & ownership in what they build, being able to voice their opinion and share their expertise, but also listen and make the correct decision. They shouldn't be afraid to celebrate their successes, admit their mistakes and turn to others for help, maintaining a sense of honesty and humility. They should be looking to improve each day, caring about the tech, architecture, and people they interact with, understanding both the small details and the big picture in everything we do. Finally, they should be able to coach, mentor, and inspire those around them, embedding excellence, a sense of safety, and a desire to succeed in their teams, ensuring these values are adhered to at all levels.

What you will do…

Design and build the AI platform that powers LLMs, agents, and other AI solutions across Dojo.
Develop APIs, SDKs, and tooling that allow product teams to consume AI capabilities at scale while having great developer experience
Implement orchestration for multi-model and multi-service workflows (e.g., LangGraph, Crew AI , Google Agent Development Kit, agentic frameworks).
Build and Manage vector search and retrieval systems to support RAG and knowledge integration.
Build robust monitoring, logging, and guardrails to ensure AI systems are safe, observable, and compliant using solutions like Langsmith , Opik, Prometheus and Grafana.
Automate infrastructure and model deployment with Kubernetes, Terraform, and CI/CD pipelines.
Partner with security, compliance, and product to ensure safe use of AI in production.
Stay on top of AI platform trends, open-source tools, and emerging patterns - bringing best practices into our stack.

Requirements

Strong software engineering or Platform engineering background (Python/GO/Java; Go/Java/.NET a bonus).
Solid experience with distributed systems, microservices, and cloud-native architecture (GCP preferred).
Hands-on experience with Kubernetes, service mesh, and event-driven systems.
Familiarity with LLM orchestration frameworks (LangChain, LangGraph, CrewAI, GCP ADK or similar).
Experience with vector databases (FAISS, Pinecone, Weaviate, Vertex Vector Search) and RAG pipelines.
Knowledge of MLOps/AI infra tools (MLflow, VertexAI, Ollama, OpenRouter etc).
Strong CI/CD and infrastructure-as-code skills (Terraform, Helm, etc.).
Good understanding of AI governance, monitoring, and responsible AI practices.
Comfort balancing speed (PoCs) with robustness (production-ready systems).