AI Evaluation Engineer
Role details
Job location
Tech stack
Job description
- Designing evaluation frameworks for agentic systems. Ensuring quality, coverage, safety, efficiency, and regulatory compliance
- Building LLM-as-judge pipelines: prompt design, calibration, consistency validation
- Integrating evaluation into CI/CD: automated regression detection when context (prompts, data sources etc.), models or services change
- Running evaluation experiments when there are relevant changes or additions
- Constructing and maintaining evaluation datasets (input, ground truth etc.).
- Working with domain experts to translate requirements into measurable criteria
- Developing shared evaluation tooling and patterns that teams across DNB can adopt
Requirements
- Relevant background from evaluating AI / ML systems
- Proficiency in Python for experiments and evaluation
- Knowledge of observability, logging, and dataset curation
- Experience running structured experiments spanning many roles
- Familiarity with AI tooling for developer productivity (Claude Code, Copilot etc.)
- Experience with agentic evaluation techniques (using code, LLMs and humans)
- Experience with agentic evaluation solutions (AgentCore, Foundry, MLflow etc.)
Tech stack example: Python, Strands, AWS AgentCore, AWS Bedrock, MCP, Mlflow, OTel, Docker, GitHub Actions
What you bring You bring an evidence driven mindset and back your claims and decisions with data and numbers. You act as a role model for rigorous and responsible AI testing, setting a high bar for quality, safety, and trustworthiness in everything you build. You communicate and collaborate effectively across roles and teams, and you take true ownership to the team's goals.
Benefits & conditions
You'll work on challenging and meaningful tasks in a strong engineering culture with solid opportunities for professional growth and career development. We offer attractive pension and insurance schemes, as well as employee benefits on DNB's products. You'll also have access to company cabins across Norway, sports, cultural and social activities, and a wide range of employee discounts. We support flexibility in everyday work through flexible working hours, a hybrid way of working, extra days off, and reduced working hours from May to August (summertime).