{"@context":"https://schema.org","@graph":[{"@context":"https://schema.org/","@type":"JobPosting","@id":"#jobPosting","title":"Senior AI Engineer

Veridox

7 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Remote

Tech stack

Artificial Intelligence

Amazon Web Services (AWS)

Azure

DevOps

Fraud Prevention and Detection

Human-Computer Interaction

Python

Regression Testing

Data Streaming

TypeScript

Large Language Models

Bitbucket

Job description

We're looking for a hands-on, delivery-first engineer to lead the development and optimisation of our LLM and RAG pipelines. This isn't a research role. You'll be responsible for building, benchmarking, and deploying high-performance, cost-efficient AI features that work, and improve, in production.

We're not looking for 100-page white papers. We're looking for someone who can ship features, track performance, and find novel solutions to customers problems.

What You'll Do :

Build and optimise RAG pipelines using AWS Bedrock, OpenSearch, and vector stores
Own our "Golden Dataset", curating the truth-set we use to evaluate model output
Automate evaluation using tools like RAGAS, DeepEval, or custom "LLM-as-a-judge" logic
Track drift, hallucination, and cost using observability tooling (Arize, Phoenix, etc.)
Design self-improving systems where user interaction data flows back into future retrieval / ranking
Balance cost and performance by selecting the right model for the right task (Claude, SLMs, or whatever gets the job done )
Write clean and fast Python and ship infrastructure as code

Tech Stack :

If your experience is a mix-and-match of a selection of the below platforms and technologies, we'd like to hear from you.

Languages : Python, TypeScript, HCL
Vector & Search : OpenSearch, AWS S3 Vectors
Observability & Evaluation : Arize, Phoenix, RAGAS, DeepEval
Infrastructure : AWS Step Functions, Azure Function Apps
DevOps : CI / CD pipelines (BitBucket), You will work on a system where evaluation is central to the product. You'll have the autonomy to define standards for building, measuring, and improving complex AI systems. If you care about rigour, impact, and building things that matter : we'd love to hear from you.

Requirements

Proven experience building LLM / RAG pipelines in production
Confidence in statistical evaluation (sample sizes, regression testing)
Ability to define evaluation metrics and continuously improve model outputs
Strong understanding of unit economics in LLM systems (token cost, latency, accuracy trade-offs)
Clear communicator who can flag blockers early and ship fast Nice-to-Have :
Experience with AWS S3 Vector store or similar
Familiarity with AI-driven fraud detection, legal tech, or investigative tools
Prior work with small language models (7B-8B) for cost-effective inference