{"@context":"https://schema.org","@graph":[{"@context":"https://schema.org/","@type":"JobPosting","@id":"#jobPosting","title":"Senior AI Engineer

Veridox
7 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Remote

Tech stack

Artificial Intelligence
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Azure
DevOps
Fraud Prevention and Detection
Human-Computer Interaction
Python
Regression Testing
Data Streaming
TypeScript
Large Language Models
Bitbucket

Job description

We're looking for a hands-on, delivery-first engineer to lead the development and optimisation of our LLM and RAG pipelines. This isn't a research role. You'll be responsible for building, benchmarking, and deploying high-performance, cost-efficient AI features that work, and improve, in production.

We're not looking for 100-page white papers. We're looking for someone who can ship features, track performance, and find novel solutions to customers problems.

What You'll Do :

  • Build and optimise RAG pipelines using AWS Bedrock, OpenSearch, and vector stores
  • Own our "Golden Dataset", curating the truth-set we use to evaluate model output
  • Automate evaluation using tools like RAGAS, DeepEval, or custom "LLM-as-a-judge" logic
  • Track drift, hallucination, and cost using observability tooling (Arize, Phoenix, etc.)
  • Design self-improving systems where user interaction data flows back into future retrieval / ranking
  • Balance cost and performance by selecting the right model for the right task (Claude, SLMs, or whatever gets the job done )
  • Write clean and fast Python and ship infrastructure as code

Tech Stack :

If your experience is a mix-and-match of a selection of the below platforms and technologies, we'd like to hear from you.

  • Languages : Python, TypeScript, HCL
  • Vector & Search : OpenSearch, AWS S3 Vectors
  • Observability & Evaluation : Arize, Phoenix, RAGAS, DeepEval
  • Infrastructure : AWS Step Functions, Azure Function Apps
  • DevOps : CI / CD pipelines (BitBucket), You will work on a system where evaluation is central to the product. You'll have the autonomy to define standards for building, measuring, and improving complex AI systems. If you care about rigour, impact, and building things that matter : we'd love to hear from you.

Requirements

  • Proven experience building LLM / RAG pipelines in production

  • Confidence in statistical evaluation (sample sizes, regression testing)

  • Ability to define evaluation metrics and continuously improve model outputs

  • Strong understanding of unit economics in LLM systems (token cost, latency, accuracy trade-offs)

  • Clear communicator who can flag blockers early and ship fast Nice-to-Have :

  • Experience with AWS S3 Vector store or similar

  • Familiarity with AI-driven fraud detection, legal tech, or investigative tools

  • Prior work with small language models (7B-8B) for cost-effective inference

Apply for this position