Investigating performance across Key Stage 2 maths topics

Professor John Jerrim
UCL Institute of Education
Dave Thomson
FFT Education Datalab
Natasha Plaister
FFT Education Datalab

Main Grants (Research, Development and Analysis Fund)

Project overview

This project investigated the feasibility of constructing separate sub-domain scores for each National Curriculum area of England’s Key Stage 2 mathematics test.

Why is this important?

Key Stage 2 SATs are used to monitor primary school performance, provide information about individual achievement and gaps, and understand wider national and sub-national achievement changes over time. Research and policy analysis have typically focused on overall scores. However, schools and teachers have increasingly sought more detailed feedback to help identify strengths and weaknesses across different areas of the mathematics curriculum.

At the time of the project, sub-domain information was provided to schools as raw scores through the Department for Education’s Analyse School Performance tool, without any indication of reliability or uncertainty. The project therefore set out to assess whether a more psychometrically principled approach could generate sub-domain scores that were meaningful and reliable, and prove genuinely useful to schools.

What did it involve?

The analysis drew on question-level data from Key Stage 2 mathematics tests taken between 2018 and 2023 by pupils in 500 schools, covering more than 76,000 pupils, with findings replicated using a larger sample. The project combined classical test theory analyses, factor analysis and multi-dimensional Item Response Theory with latent regression to estimate scores across eight National Curriculum domains. The project also assessed the stability of school-level scores over time, examined correlations across domains and explored differences in attainment between demographic groups.

What did it find?

Raw sub-domain scores were found to be problematic. The Key Stage 2 mathematics test appeared to be largely unidimensional and test questions did not cluster clearly by curriculum domain. Reliability varied substantially across domains and was particularly weak where only a small number of questions were used. Although more advanced modelling techniques were able to produce statistically reliable sub-domain scores for schools, these scores were very highly correlated with one another and with overall mathematics performance. As a result, they added little distinctive information and did not allow schools to identify meaningful relative strengths or weaknesses across different curriculum areas. Differences in attainment by gender, disadvantage and special educational needs were also found to be very similar across domains.

The project concluded that:

The reporting of raw sub-domain scores back to schools should be discontinued.
The Department for Education should encourage schools to focus on fewer and higher-quality indicators of school performance, including multi-year averages to reduce volatility in the data in a single year.
If sub-domain reporting continues due to user demand, it should be based on robust methods that clearly communicate uncertainty
More fundamentally, if diagnostic information is required, then the Key Stage 2 tests need to be redesigned.

January 2025 - June 2026

£98,728

Education

Team