PhD Studentship: Learning to Align Generative Artificial Intelligence Agents with Human Goals

University of East Anglia

18 days ago

Role details

Contract type

Temporary contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Job location

Tech stack

Artificial Intelligence

Data analysis

Computer Vision

Game Theory

Python

Machine Learning

Natural Language Processing

TensorFlow

PyTorch

Large Language Models

Deep Learning

Generative AI

Information Technology

Formal Methods

Job description

Primary supervisor - Dr Farhana Ferdousi Liza

Generative Artificial Intelligent (AI) agents are goal-driven systems powered by the sequential reasoning of Large Language Models (LLMs). Such AI Agents are fundamentally optimised to predict the next word, image, or token in a sequence. Despite their impressive capabilities, these systems often produce outputs that are unintentionally harmful, biased, or misaligned with human intentions and social norms.

This PhD project aims to develop a novel learning and alignment framework for mitigating such misalignments and ensuring that AI agents act in accordance with human goals, values, and reasoning processes. The research will integrate insights from human decision-making studies to inform the technical design of new alignment strategies. The design will shape both the core architecture and the evaluation framework. Additional data collection will be conducted to further identify and close the alignment gap between machine-generated and human-preferred behaviours.

The PhD student will pursue three key objectives:

Dataset and Evaluation Development: Create novel datasets, evaluation metrics, and evaluation protocols to detect, quantify, and analyse misalignments between AI agents' behaviour and human expectations.

Algorithm and Model Design: Develop innovative algorithms, including new objective functions, training paradigms, and reasoning strategies, to guide AI agents and language models toward human-aligned outputs. This includes, but is not limited to, algorithms and modelling based on multi-agent learning, imitation learning, model distillation, and adaptation, all will aim at preserving both alignment and performance.

Theoretical Foundations: Develop and extend theoretical foundations to achieve a formal understanding of alignment dynamics, providing insights into how AI Agent can reason about and align with human goals.

Essential selection criteria:

At least an upper second class degree (preferably MSc) in Computer Science or a Science or Technology discipline.

Good working knowledge of machine learning and deep learning.

Hands-on knowledge of Python and/or PyTorch and/or Tensorflow for implementing machine learning and/or deep learning algorithms.

Capability to work both independently and as part of a team.

Excellent written and oral communication and organisational skills. Proficiency in written English is required.

A real passion and commitment for research.

Desirable criteria:

Knowledge of a variety of deep learning architectures and methods in natural language processing and/or computer vision.

Knowledge or past work on alignment strategies that shape core architectures in AI.

Previous publication record in relevant fields: AI, machine learning, natural language processing, computer vision, etc.

Previous successful project on a relevant topic.

Mode of study

Full-time

Start date

1 October 2026

Funding

This PhD project is in a competition for a Faculty of Science funded studentship. Funding is available to UK applicants and comprises 'home' tuition fees and an annual stipend for 3 years.

Closing Date

10/12/2025

To apply for this role, please click on the 'Apply' button above.

'Home' tuition fees and an annual stipend for 3 years

Requirements