Valueadded is of little value
Stephen Gorard
Department of Educational Studies
University of York
YO10 5DD
sg25@york.ac.uk
Paper presented at the British Educational Research Association Annual Conference, University of Glamorgan, 1417 September 2005
Published indicators of school ‘performance’, such as those shown annually in league tables in England, have been controversial since their inception. Rawscore figures for school outcomes are heavily dependent on the prior attainment and family background of the students. Policymakers in Wales have reacted to this fundamental flaw by withdrawing the publication of school results. In England, on the other hand, they have reacted by asking for more information to be added to tables, in the form of student context such as the percentage with a special educational need, and ‘valueadded’ figures. In 2004, the DfES valueadded figures were based on student progress from Key Stage 2 to GCSE. In 2005, at time of writing, they plan to use context information in their model as well. This paper reanalyses the 2004 valueadded figures and shows that they contain the same flaw as the original rawscore tables. The purported valueadded scores turn out to be a proxy for the overall level of attainment in the school, and almost entirely independent of the progress made by the students. The paper concludes by considering the implications of these findings, if accepted, for policies based on identifying schools that are clearly more or less effective, and for the field of school effectiveness and improvement research.
Introduction to valueadded comparisons
This paper reminds readers of some of the flaws in previous attempts to measure the performance of schools, and of how such measurement is currently being attempted by the Department for Education and Skills (DfES) for secondary schools in England. The paper then reanalyses the figures from 2004, and shows that league tables still suffer from what was identified some time ago as ‘the politician’s error’ (Gorard 1999). This is where differences between two sets of figures are considered in isolation from the scale of the original figures. The enormity of the problem, once accepted, for policymaking, the local reputation of schools, and for studies of school effectiveness would be difficult to overemphasise.
‘League’ tables of school examination outcomes have been controversial since their introduction in England and Wales in the early 1990s. In their simplest form, the tables list the schools in each local education authority (LEA) area, the number of students in a relevant age cohort, and the percentage of those students attaining a relevant qualification or its equivalent. In the printed press these tables have been presented in descending order of percentage attainment by schools. This information was intended to help parents make judgements about the quality of their local schools, especially in an era of increased school choice (actually school preference) after the 1988 Education Reform Act.
However, these lists were labelled school ‘performance’ tables incorrectly, because there is a very high correlation between the nature of the student intake to any school and their subsequent public test scores (Gorard and Smith 2004). This correlation can be expressed in several ways, including in terms of prior attainment or indicators of socioeconomic disadvantage. For example, the figures for all secondary schools in England in 2004 are shown in Figure 1. The correlation is +0.87 between each schools’ intake in KS2 points and its subsequent outcomes in percentage obtaining five or more GCSE grade A*C. This means that the majority of the difference between school outcomes (76%) is directly attributable to the prior attainment of their intake, and thence to the socioeconomic characteristics of the intake rather than to the work of each school itself (Gorard 2000). There are no schools with high KS2 intake scores that have low GCSE outcome scores, or vice versa. There are threshold effects at or near 100%, but otherwise there is, as expected, a very clear linear relationship between input and output. It is, therefore, not at all clear how effective each school had been in producing the outcomes other than in attracting high attaining students in the first place. In theory, it is possible that some low scoring schools are more effective at dealing with equivalent students than highscoring ones, but that this not to be reflected in the rawscore tables.
Figure 1 – The link between KS2 points and GCSE benchmark, secondary schools in England, 2004
Note: figures provided by DfES. The figures exclude special schools, school with no GCSE entries, no GCSE ‘passes’ or no Key Stage 2 results, independent schools opting out of the valueadded scheme, those whose results were suppressed by the DfES on grounds that individuals might be identifiable, and those with fewer than 30 cases in the cohort.
Note: the perceived ‘width’ of the scatter depends upon the scales used. That is why the correlation coefficients quoted in this paper are a better guide to ‘effect’ sizes.
In any study of school effects, typically between 0 and 25% of the variation between school outcomes remains to be explained, as here, and this residual includes a very important error component (Gorard 2006). This major finding of work in the school effectiveness genre has been quite consistent over time. The larger the study, the more variables available for each student, the more reliable the measures are, and the better conducted the study, the stronger is this link between school intake and outcomes. This strong link makes the work of school improvers difficult, since the most obvious way for any school to produce higher outcome scores is to improve the intake scores, or exclude students with low intake scores before the age 15+. This would be ‘sleightofhand’ school improvement and, of course, a zerosum process by correspondingly reducing the intake scores to neighbouring schools. However, SESI advocates try to guard against being misled by not using rawscores outcomes, and using valueadded models instead. These models are intended to take the prior attainment of each student into account, and so to produce scores that are ‘a measure of the progress students make between different stages of education’ (DfES 2005a).
In response to such problems, general information about school performance is no longer made publicly available in Wales (or Scotland), largely because of its potential to mislead. In England, the alternative response has been to try and maintain the freedom of this information while remedying its defects. One such remedy has been termed ‘valueadded’ analysis. In this, the prior attainment of each student is taken into account, such that the published performance figures reflect not the intake to the school but the average progress made by students while in the school. The DfES valueadded scores for the average student progress from Key Stage 2 (KS2, the prior attainment of the student at primary school) to Key Stage 4 (examinations at the end of compulsory) in each secondary school are calculated as follows (full details available at DfES 2005a). For the 2004 figures, all students in a school were included who were aged 15 or more on 31^{st} August 2003, still on the school roll in January 2004, and with at least one matched KS2 score. The KS2 levels achieved by each student for each core subject were converted to point scores, and then averaged across the three (or fewer) subjects. KS4 points were calculated as a sum of the scores for the best eight GCSE results (or their ‘equivalent’). Nationally, it is possible to calculate a median value for progress from any KS2 score to KS4, such that half of the students make less progress and half make more. The median value for 21 points at KS2 (‘equivalent’ to an average of level three) was 202 KS4 points (roughly ‘equivalent’ to five GCSEs at grade C), for example. The valueadded score for each student is the difference between their actual KS4 score and the median calculated from their prior KS2 score. The valueadded score for each school is the average of the valueadded scores for all students meeting the definition above (with 1000 added to this average to eliminate negative values). The results generally range from 900 to 1100, but are not uniformly distributed.
A surprising correlation
This paper uses the GCSE (KS4) results for mainstream secondary schools in England in 2004, their KS3 results for 2002, the KS2 scores of their intake in 2000, and the published DfES valueadded scores for the same schools in 2004 (DfES 2005b). Results are presented in scatterplot form, or as Pearson R correlation coefficients  which can be squared to give an ‘effect’ size.
The reanalysis presented starts with the 124 schools with complete information in York, Leeds, East Riding of Yorkshire, and North Yorkshire. These are used as illustrations of the wider pattern. For the 124 mainstream secondary schools in the four Yorkshire LEAs, Table 1 shows a number of unsurprising correlations between indicators of attainment. The link between school outcomes at KS3 and KS4 is +0.93, which means that over 86% of the variance in school outcomes at KS4 is potentially explicable by school outcomes at KS3. In general, we may assume that students who do well at KS3 tend also to do well at KS4, and vice versa. This kind of correlation is one basis for the complaints about rawscore league tables (see above). Table 1 also shows a correlation of +0.81 between the DfES valueadded figures for student progress from KS2 to KS3 and from KS2 to KS4. This is also unsurprising because the two measures overlap, with the former subsumed by the latter. The correlation suggests that schools which add value to students’ progress up until KS3 add similar value to progress up until KS4. This could mean that the bulk of students’ progress occurs by KS3 or that schools tend to be equivalently effective for both phases of secondary education, or a combination of these explanations. In fact, however, neither explanation is necessary because of the implications of the other correlations in Table 1.
Table 1 – Correlations between attainment and valueadded
KS3 
KS4 
VA KS23 
VA KS24 

KS3 
 
+0.93 
+0.91 
+0.87 
KS4 
+0.93 
 
+0.81 
+0.96 
VA KS23 
+0.91 
+0.81 
 
+0.81 
VA KS24 
+0.87 
+0.96 
+0.81 
 
There are unexpectedly high correlations between each indicator of overall school attainment and the valueadded figures from each phase. The highest (+0.96) is between the KS4 absolute score for each school and the purported ‘valueadded’ score for student progress from KS2 to KS4. These figures for each school are crossplotted in Figure 2, which shows quite clearly that schools with high outcome scores have high valueadded figures and vice versa. In fact, we could predict the valueadded figure for any school extremely well just from their absolute level of final attainment.
Figure 2 – The relationship between valueadded and absolute attainment 2004
Note: for ‘statistical reasons’, DfES (2005a) report that the national average valueadded score was 988.1 (rather than 1000).^{1}
How can we explain this surprising correlation? It could, of course, be the case that the progress made by students in all schools is truly in direct proportion to their school’s average outcomes. Ceteris paribus, one would expect schools that helped students to make considerable progress also to have higher outcomes, in general, than those that did not. Perhaps Figure 2 merely reflects this? There are several reasons why this cannot account for all, or even much, of the story. The pattern is too good.^{2} There are no low to mid attaining schools with high valueadded scores. All of the schools with a GCSE benchmark of 40% or less are deemed negative valueadded. Similarly there are no high to mid attaining schools with low valueadded scores. All of the schools with a GCSE benchmark of 80% or more are deemed positive valueadded. The remaining schools, with ‘average’ GCSE benchmarks are mostly very close to zero valueadded. It would be possible to argue that the residual variation represents differential effectiveness, but the combined error term in assessing, measuring, aggregating and analysing the data is a more plausible explanation for these relatively minor differences. It is also important to note that a similarly high correlation appears between the KS4 valueadded score and the prior KS3 results. Given that the assessment system in England is not wholly reliable, and that no data collection, transcription, aggregation or analysis process is wholly accurate, the correlation of +0.96 means that the DfES valueadded figures and the rawscores for absolute attainment are actually measuring the same thing. It would, therefore, be much simpler to use the rawscore values than to bother with the computation of ‘valueadded’ figures that tell exactly the same story. But these rawscore values have already been rejected by most commentators as being unrelated to school performance. This means, ironically, that the valueadded scores have to be rejected on the same grounds.
For the same set of secondary schools as in Figures 1, for England in 2004, the DfES valueadded school ‘performance’ figures have a correlation of +0.84 with the rawscore outcomes they are intended to replace, which themselves have a +0.87 correlation with the prior KS2 scores. This means that 71% of the variation in school valueadded scores is explicable in terms of their rawscores alone. Figure 3 shows that, again ignoring the threshold effects, there is a clear pattern of low attaining schools having low VA, and high attaining schools having low VA. Valueadded scores are no more independent of rawscore levels of attainment than outcomes are independent of intakes.
Figure 3  The link between GCSE benchmark and KS2 to GCSE valueadded, secondary schools in England, 2004
Note: the valueadded scores are those published by the DfES, with a mean of around 988.
Some implications
School effectiveness and school improvement (SESI) forms an entire field of research and policy endeavour. It is based on the dual premise that schools are differentially effective with equivalent students, and that is possible to transfer good practice from the more successful schools to the less successful ones. The emphasis is on the rather narrow view of schools as producers of examination and test scores. The Academies programme is one example stemming from this idea, and its questionable initial success has already been attributed to visionary school leadership, curriculum change, and a range of other SESItype factors (Gorard 2005). In order to show that a good school is differentially effective we have to establish that exactly equivalent students would achieve lower test scores after education at another school. In order to show that a school has improved we have to establish that there has been an improvement in test scores that cannot be explained by a change in the nature of the school intake. Otherwise, SESI is ‘sleightofhand’
If accepted, the analysis above show that ‘valueadded’ results for secondary schools are nothing of the sort because they are not independent of absolute level of attainment at KS2, KS3 or KS4. Valueadded scores are no more independent of rawscore levels of attainment than outcomes are independent of intakes. The valueadded figures are actually the transformed equivalent of rawscores and, therefore, suffer from precisely the same defects as rawscores, of being largely predictable from prior attainment and/or student background. The rather lengthy procedures described by the DfES (2005a) to produce scores that are ‘a measure of the progress students make between different stages of education’ have been shown to be pointless. This may explain the otherwise rather perplexing finding from simulated models that valueadded figures are no more accurate than rawscore figures as a predictor of school performance (Hoyle and Robinson 2003). In 2005, at time of writing, the DfES plan to use context information in their model as well. This may mask, but will not solve, the problem described here.
In fact, the valueadded calculations are rather worse than pointless because their apparent precision and technical sophistication may have misled analysts, observers and commentators into believing that they had succeeded, or that a greater range of variables or a more complex analytical model would somehow solve the outstanding problems. They make it harder for many commentators to understand fully what the figures mean, and what their limitations are. Some commentators will, therefore, become like the idealised ‘villain’ described at the start of Gorard (2003a), who is not appropriately sceptical of statistical analysis, and is overly impressed by technicalities at the expense of coherence and transparency. One recent suggestion has been that multilevel modelling is the way forward here, but techniques such as this are already overused, and inappropriate for population data such as the DfES school performance figures. They create such a distance between the data and the analyst that they appear to encourage simple errors such as mistaking what the numbers involved represent (Gorard 2003b, 2004).
If accepted, this reconsideration suggests that school improvement policies, at least in this narrow sense, are always likely to be ineffective. Policies concerning schools, and judgements about the relative effectiveness of different schools and types of schools, will have been misled where they have been based on the government’s valueadded analyses (or indeed any valueadded analysis that does not correct for the politician’s error). Families may have been misled about the relative effectiveness of their local schools, with the schools in poorer areas and with academically weaker intakes suffering the most from this misguided comparison dressed up as a ‘fair test’. School improvers and school improvement researchers, relying on valueadded analyses, will have been misled in their explanations and in making recommendations for practice. The implications of this simple mistake are legion.
The majority of the variation in school examination outcomes can be explained by the intake to the school (prior attainment, socioeconomic background, and educational need). However, this undisputed finding has led to two very different conclusions. For one group of commentators, it seems obvious that the finding minimises the role of the school, and alerts us to seek improvements in equity otherwise. School improvement is, therefore, seen as ‘sleightofhand’ in generally being explicable by changes in school intakes. Any unexplained variation could be attributed to errors in measuring attainment and bias caused by missing data. For another group of commentators, the fact that at least some variation in school outcomes is not explained by school intakes is evidence that schools must have important differential effects on outcomes. This position lies behind the creation of valueadded league tables. In light of the foregoing such a position seems less tenable than ever before.
Notes
1. Intriguingly, the fact that the national average for valueadded was 988.1 after adding 1000 suggests that the average effect on student progress of all of the secondary schools in England is negative! Of course, this cannot be so because the value is based on the cohort itself, and all ‘effects’ are relative to the median. However, the scale of this spurious negative average gives us an alarming indication of the level of slippage in the process as a whole.
2. The slight scatter at the 100% end of the xaxis in Figures 13 could be explained by the lack of freedom to vary at this ceiling. The schools at or near 100% on the GCSE benchmark figure have some variation on the yaxis that is unrelated to the xaxis because that is the only kind of variation possible (compare the concept of regression towards the mean).
References
DfES (2005a) Valueadded Technical Information, http://www.dfes.gov.uk/performancetables/schools_04/sec3b.shtml , (accessed 25/2/05)
DfES (2005b) http://www.dfes.gov.uk/performancetables , (accessed 25/2/05)
Gorard, S. (1999) Keeping a sense of proportion: the "politician's error" in analysing school outcomes, British Journal of Educational Studies , 47, 3, 235246
Gorard, S. (2000) Education and Social Justice, Cardiff: University of Wales Press
Gorard, S. (2003a) Quantitative methods in social science, London: Continuum
Gorard, S. (2003b) Understanding probabilities and reconsidering traditional research methods training, Sociological Research Online, 8,1, 12 pages
Gorard, S. (2004) Comments on modelling segregation, Oxford Review of Education, 30, 3, 435440
Gorard, S. (2005) Academies as the ‘future of schooling’: is this an evidencebased policy?, Journal of Education Policy, 20, 3, 369377
Gorard, S. (2006) Is there a school mix effect?, Educational Review, 58, 1
Gorard, S. and Smith, E. (2004) What is ‘underachievement’ at school?, School Leadership and Management, 24, 2, 205225
Hoyle, R. and Robinson, J. (2003) League tables and school effectiveness: a mathematical model, Proceedings of the Royal Society of London B, 270, 113199
This document was added to the EducationLine database on 03 October 2005