AI Evaluation Engineer

Dnb
2 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Tech stack

Artificial Intelligence
Amazon Web Services (AWS)
Continuous Integration
Github
Python
Systems Integration
Data Logging
Large Language Models
Machine Learning Operations
Docker

Job description

  • Designing evaluation frameworks for agentic systems. Ensuring quality, coverage, safety, efficiency, and regulatory compliance
  • Building LLM-as-judge pipelines: prompt design, calibration, consistency validation
  • Integrating evaluation into CI/CD: automated regression detection when context (prompts, data sources etc.), models or services change
  • Running evaluation experiments when there are relevant changes or additions
  • Constructing and maintaining evaluation datasets (input, ground truth etc.).
  • Working with domain experts to translate requirements into measurable criteria
  • Developing shared evaluation tooling and patterns that teams across DNB can adopt

Requirements

  • Relevant background from evaluating AI / ML systems
  • Proficiency in Python for experiments and evaluation
  • Knowledge of observability, logging, and dataset curation
  • Experience running structured experiments spanning many roles
  • Familiarity with AI tooling for developer productivity (Claude Code, Copilot etc.)
  • Experience with agentic evaluation techniques (using code, LLMs and humans)
  • Experience with agentic evaluation solutions (AgentCore, Foundry, MLflow etc.)

Tech stack example: Python, Strands, AWS AgentCore, AWS Bedrock, MCP, Mlflow, OTel, Docker, GitHub Actions

What you bring You bring an evidence driven mindset and back your claims and decisions with data and numbers. You act as a role model for rigorous and responsible AI testing, setting a high bar for quality, safety, and trustworthiness in everything you build. You communicate and collaborate effectively across roles and teams, and you take true ownership to the team's goals.

Benefits & conditions

You'll work on challenging and meaningful tasks in a strong engineering culture with solid opportunities for professional growth and career development. We offer attractive pension and insurance schemes, as well as employee benefits on DNB's products. You'll also have access to company cabins across Norway, sports, cultural and social activities, and a wide range of employee discounts. We support flexibility in everyday work through flexible working hours, a hybrid way of working, extra days off, and reduced working hours from May to August (summertime).

About the company

People are the very DNA of DNB. Since 1822, bright minds have worked together to find the best solutions for our customers. Today, DNB is much more than Norway's largest bank - we are a technology-driven financial institution that continuously connects people and ideas to knowledge and capital in new ways. Diversity is part of who we are, and inclusion is something we actively choose every single day. We promise to do our best to make you feel at home. A job at Norway's largest financial group offers professional challenges in an exciting work environment with many opportunities for development., AI Tech is DNB's new division within Technology & Services, created to accelerate our shift from AI experimentation to real, measurable impact. We bring together deep technical expertise, modern AI platforms, and hands-on delivery to scale agentic AI across the group. We move fast, learn fast, and deliver real outcomes. We expect every member of AI Tech to be a role model in the everyday use of AI -using AI proactively in coding, automating tasks, improving documentation, and accelerating problem-solving. You help raise the overall AI maturity across DNB by demonstrating what AI-first engineering looks like.

Apply for this position