AI Software/QA Test Engineer

Diagonal Matrix Ltd

Reading, United Kingdom

2 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Compensation

£ 35K

Job location

Remote

Reading, United Kingdom

Tech stack

Testing (Software)

API

Artificial Intelligence

Amazon Web Services (AWS)

Software Applications

Automation of Tests

Azure

Business Software

DevOps

Python

Regression Testing

Software Reliability Testing

Software Safety

Data Streaming

System Testing

Systems Integration

Flask

Large Language Models

Generative AI

Backend

FastAPI

Web Filtering

AI Platforms

Integration Tests

Virtual Agents

Api Design

GPT

Automation Anywhere

Api Management

Microservices

Job description

Job Title: AI Test Engineer / AI QA Engineer (LLM, RAG & Agentic AI) We are seeking a skilled and detail-oriented AI Test Engineer to test, validate, and assure the quality of advanced AI-powered applications built using modern LLM, RAG, and Agentic AI architectures. The ideal candidate will have strong hands-on experience in software testing, Python-based test automation, API testing, and practical exposure to AI systems built with frameworks such as LangChain, LangGraph, and OpenAI ecosystems.

You will work closely with AI Engineers and Developers to ensure AI applications are reliable, accurate, secure, and production-ready. The role will focus on validating AI behaviours, testing Retrieval-Augmented Generation (RAG) pipelines, evaluating AI agents, verifying embeddings and search quality, and ensuring end-to-end quality across AI workflows and integrations.

Key Responsibilities AI Application Testing

Test AI applications built using Large Language Models (LLMs).
Validate the behaviour, reliability, and output quality of AI-powered applications and APIs.
Design and execute functional, regression, integration, and end-to-end test cases for AI systems.
Work closely with AI Engineers to identify defects, edge cases, and system weaknesses early in the development lifecycle.

LLM Testing & Validation

Test LLM-based features for response quality, consistency, accuracy, and robustness.
Validate prompt behaviour and test prompt variations across different business scenarios.
Perform negative testing, boundary testing, and exception testing for AI outputs.
Support evaluation of hallucination risk, response relevance, and output stability.

RAG & Knowledge System Testing

Test Retrieval-Augmented Generation (RAG) pipelines end to end.
Validate document ingestion, chunking, embedding generation, retrieval quality, and final answer relevance.
Verify semantic search behaviour across vector databases and knowledge sources.
Work with AI Engineers to identify retrieval failures, context mismatch issues, and ranking problems.

Agentic AI & Workflow Testing

Test AI Agents built using LangChain and LangGraph.
Validate agent workflows, tool usage, state transitions, memory behaviour, and multi-step execution paths.
Test decision points, fallback handling, retry logic, and workflow orchestration behaviour.
Ensure agentic systems behave safely and predictably under expected and unexpected conditions.

API, Integration & System Testing

Test AI-driven microservices, APIs, and enterprise integrations.
Validate data flow between AI components, external tools, vector databases, and business applications.
Perform integration testing across AI services, backend components, and user-facing workflows.
Verify system behaviour in production-like environments.

Quality Assurance & Automation

Develop automated test cases and reusable test suites for AI applications.
Build test data, test scenarios, and validation frameworks for structured and unstructured AI use cases.
Support continuous testing within development and release cycles.
Track, document, and communicate defects clearly to engineering and product teams.

AI Safety, Performance & Observability

Test AI systems for reliability, latency, failure handling, and performance under load.
Validate safety controls, content filtering, output guardrails, and error-handling behaviour.
Support monitoring, evaluation, and observability of AI systems in development and production.
Contribute to quality benchmarks, acceptance criteria, and release readiness reviews for AI features.

Engineering Collaboration

Collaborate closely with AI Developers, Product teams, DevOps, and Data teams.
Participate in requirement reviews, solution discussions, and test planning sessions.
Ensure testing is aligned with business requirements and technical design.
Promote high testing standards and a quality-first engineering approach across AI delivery.

Required Technical Skills (High-Level)