Trialling an evaluation framework for LLM-powered careers advice

Project overview

This project will develop and pilot an evaluation framework to evaluate AI-powered careers advice tools against human advisers.

Why this project is important

Careers advice is a critical component of UK education and work transitions, particularly for disadvantaged groups who face uneven support. AI has the potential to enhance access, quality, and sustainability – but only if due care is taken to ensure tools are only recommended in appropriate circumstances and quality assured.

Policy makers, service managers, and practitioners have no high-quality evidence on the performance of these tools. However, research indicates that students already use large language models (LLMs) for personalised career guidance. The careers sector is not currently equipped to manage this trend, with low awareness and mixed technical expertise. Without evidence-based assessments, there is a risk of AI tools being adopted based on cost or convenience rather than efficacy.

What it will involve

The research team aim to address this gap by creating an evidence base to guide responsible use of AI tools, ensuring they enhance accessibility, quality, and outcomes in career advice.

The following questions will be answered:

What nature of evaluation framework would the careers sector accept for assessing careers advice powered by LLMs?
Within the framework, what is the appropriate balance between evaluation methods; panel review; case study comparisons; and a randomised field trial? What thresholds are appropriate to inform practical guidelines and specific policies on tool usage?
Based on pilot studies of a specific set of AI-powered tools, selected according to criteria such as relevance, appropriateness, tone accuracy, and actionability, what are the operational requirements for the evaluations?
Based on the pilot studies, what hypotheses might be formed on the degree of quantitative performance difference between interventions or key performance drivers/subgroup differentials?

The research will be completed in four phases:

Form an advisory group of 15-25 stakeholders to co-design the framework and pilot evaluation plan.
Co-develop the evaluation framework and evaluation materials, including structured procedures for panel reviews, case studies, and field trials.
Implement a pilot with 120 students aged 16-19 with career questions, comparing AI tools with human advisers and hybrid approaches.
Refine the framework, publish findings, and share outputs through open-access platforms, stakeholder surveys, and academic publications.

How it will make a difference

Key stakeholders include policymakers, careers advisers, sector bodies, technology providers, educators, and students. Outputs will support systemic change to improve careers advice and adoption of the assessment protocol by sector bodies to improve AI tool standards.

January 2026 - December 2027

£219,773

Education

Team