AI Software/QA Test Engineer
Role details
Job location
Tech stack
Job description
Job Title: AI Test Engineer / AI QA Engineer (LLM, RAG & Agentic AI) We are seeking a skilled and detail-oriented AI Test Engineer to test, validate, and assure the quality of advanced AI-powered applications built using modern LLM, RAG, and Agentic AI architectures. The ideal candidate will have strong hands-on experience in software testing, Python-based test automation, API testing, and practical exposure to AI systems built with frameworks such as LangChain, LangGraph, and OpenAI ecosystems.
You will work closely with AI Engineers and Developers to ensure AI applications are reliable, accurate, secure, and production-ready. The role will focus on validating AI behaviours, testing Retrieval-Augmented Generation (RAG) pipelines, evaluating AI agents, verifying embeddings and search quality, and ensuring end-to-end quality across AI workflows and integrations.
Key Responsibilities AI Application Testing
- Test AI applications built using Large Language Models (LLMs).
- Validate the behaviour, reliability, and output quality of AI-powered applications and APIs.
- Design and execute functional, regression, integration, and end-to-end test cases for AI systems.
- Work closely with AI Engineers to identify defects, edge cases, and system weaknesses early in the development lifecycle.
LLM Testing & Validation
- Test LLM-based features for response quality, consistency, accuracy, and robustness.
- Validate prompt behaviour and test prompt variations across different business scenarios.
- Perform negative testing, boundary testing, and exception testing for AI outputs.
- Support evaluation of hallucination risk, response relevance, and output stability.
RAG & Knowledge System Testing
- Test Retrieval-Augmented Generation (RAG) pipelines end to end.
- Validate document ingestion, chunking, embedding generation, retrieval quality, and final answer relevance.
- Verify semantic search behaviour across vector databases and knowledge sources.
- Work with AI Engineers to identify retrieval failures, context mismatch issues, and ranking problems.
Agentic AI & Workflow Testing
- Test AI Agents built using LangChain and LangGraph.
- Validate agent workflows, tool usage, state transitions, memory behaviour, and multi-step execution paths.
- Test decision points, fallback handling, retry logic, and workflow orchestration behaviour.
- Ensure agentic systems behave safely and predictably under expected and unexpected conditions.
API, Integration & System Testing
- Test AI-driven microservices, APIs, and enterprise integrations.
- Validate data flow between AI components, external tools, vector databases, and business applications.
- Perform integration testing across AI services, backend components, and user-facing workflows.
- Verify system behaviour in production-like environments.
Quality Assurance & Automation
- Develop automated test cases and reusable test suites for AI applications.
- Build test data, test scenarios, and validation frameworks for structured and unstructured AI use cases.
- Support continuous testing within development and release cycles.
- Track, document, and communicate defects clearly to engineering and product teams.
AI Safety, Performance & Observability
- Test AI systems for reliability, latency, failure handling, and performance under load.
- Validate safety controls, content filtering, output guardrails, and error-handling behaviour.
- Support monitoring, evaluation, and observability of AI systems in development and production.
- Contribute to quality benchmarks, acceptance criteria, and release readiness reviews for AI features.
Engineering Collaboration
- Collaborate closely with AI Developers, Product teams, DevOps, and Data teams.
- Participate in requirement reviews, solution discussions, and test planning sessions.
- Ensure testing is aligned with business requirements and technical design.
- Promote high testing standards and a quality-first engineering approach across AI delivery.
Required Technical Skills (High-Level)
- Software Testing / QA Engineering
- Python
- API Testing
- Test Automation
- Large Language Models (LLMs)
- Retrieval-Augmented Generation (RAG)
- LangChain
- LangGraph
- AI Agent Testing
- OpenAI API Integration
- Embeddings & Vector Databases
- Prompt Testing / AI Output Validation
- Integration Testing
- Regression Testing
- Specification-Driven Testing
Requirements
- Experience with FAISS / Chroma / Pinecone / Weaviate
- Experience testing FastAPI / Flask based services
- Understanding of Transformer-based AI systems
- Experience with Azure / AWS / GCP
- Knowledge of CI/CD pipelines for automated testing
- Experience with LLM evaluation frameworks
- Exposure to AI safety testing, red-teaming, or guardrail validation
- Understanding of performance, load, and reliability testing for AI systems