Automation QA Engineer

Ciklum
7 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Remote

Tech stack

Artificial Intelligence
Azure
Databases
Continuous Integration
Information Engineering
Data Files
Python
PostgreSQL
Machine Learning
Performance Tuning
Regression Testing
Systems Integration
Scripting (Bash/Python/Go/Ruby)
Large Language Models
Deployment Automation
Data Analytics

Job description

As a Automation QA Engineer, become a part of a cross-functional development team engineering experiences of tomorrow., * Own the evaluation lifecycle, offline acceptance testing, and KPI measurement for the AS client's RAG pipeline

  • Lead the co-creation and management of the project's "golden dataset" to consistently benchmark AI performance
  • Implement and manage the RAGAS evaluation harness and automated CI/CD regression testing
  • Track, classify, and build root-cause taxonomies for LLM hallucinations, with a specialized focus on code-generation correctness
  • Golden Dataset & Baselines: Collaborate with client domain experts and technical leads to build a robust synthetic test set (~90+ queries across multiple categories) and establish baseline metrics for Faithfulness, Context Precision, and Answer Relevance
  • Evaluation Harness: Build and automate evaluation pipelines using RAGAS and custom Python scripts, enabling A/B comparisons between the baseline, MVP, and full implementation
  • Regression & CI/CD Guardrails: Implement automated CI/CD regression checks within Azure DevOps, ensuring that a >5% drop in core metrics automatically blocks pipeline deployments
  • Hallucination Tracking: Develop a root-cause taxonomy for hallucinations and track code-generation queries separately to ensure the AI generates functionally correct and compilable output
  • Performance Benchmarking: Measure and monitor pipeline latency, rigorously validating P95 latency targets (sub-4.5s) under representative concurrent load, * Strong community: Work alongside top professionals in a friendly, open-door environment
  • Growth focus: Take on large-scale projects with a global impact and expand your expertise
  • Tailored learning: Boost your skills with internal events (meetups, conferences, workshops), Udemy access, language courses, and company-paid certifications
  • Endless opportunities: Explore diverse domains through internal mobility, finding the best fit to gain hands-on experience with cutting-edge technologies
  • Flexibility: Enjoy flexibility - full remote working possibilities
  • Care: We've got you covered with company-paid medical insurance, mental health support, and financial & legal consultations

Requirements

Do you have experience in Python?, Do you have a Master's degree?, * Background: Mid-to-Senior level experience in Data Science, Machine Learning Evaluation, AI Quality Assurance, or Data Engineering

  • Evaluation Frameworks: Deep, hands-on experience with LLM evaluation frameworks (e.g., RAGAS, DeepEval, TruLens) and establishing human-anchored or synthetic benchmarks
  • Technical Stack: Strong proficiency in Python. Solid experience with CI/CD tools (especially Azure DevOps) and integrating complex test suites into automated deployment pipelines
  • Data & Observability: Experience working with databases (PostgreSQL) and integrating custom telemetry or observability data (e.g., Azure App Insights) into evaluation reports
  • Analytical Mindset: Strong attention to detail with the ability to perform rigorous error analysis, build structured taxonomies for failures, and identify embedding drift

Personal skills:

  • Highly collaborative and data-driven; comfortable working directly with client SMEs to validate queries and presenting evaluation scorecards to guide engineering decisions

About the company

We are a custom product engineering company that supports both multinational organizations and scaling startups to solve their most complex business challenges. With a global team of over 4,000 highly skilled developers, consultants, analysts and product owners, we engineer technology that redefines industries and shapes the way people live., At Ciklum, we are always exploring innovations, empowering each other to achieve more, and engineering solutions that matter. With us, you'll work with cutting-edge technologies, contribute to impactful projects, and be part of a One Team culture that values collaboration and progress. With delivery centers in Wrocław and Gdańsk, our 300+ professionals in Poland drive forward-thinking solutions for global clients. Join a community where collaboration sparks innovation-and your impact reaches millions.

Apply for this position