Education-line Home Page

Monitoring progress and target setting in the secondary school: finding appropriate methods of data collection and analysis.

Matthew Baxter

EdD Assignment, University of Lincoln and Humberside

Abstract: The paper suggests a model for understanding performance data as part of the process of monitoring performance and target setting in two secondary schools. The model, which compares GCSE performance with baseline assessment data, is used to identify trends in performance according to social class, gender and ethnicity; to compare school performance with national standards; and to compare the performance of subject departments within the case study secondary schools. The paper raises questions regarding the 'fitness for purpose' of the analysis of data in schools, particularly in relation to the validity of a statistical model for predicting the progress of individual pupils. Concerns are also expressed if school leaders are to use such analysis as the basis of performance management.

Introduction

A dominant trend within British education during recent years has been an increasing emphasis on the evaluation of school effectiveness solely in terms of pupil outcomes. This has led to the publication of league tables of school performance in public examinations and the requirement that schools publish examination-related performance targets.

Whilst there are many justifiable criticisms of this trend, some of which will be discussed later, the theme of this article is that the careful analysis of performance data can be a useful enterprise in the continual monitoring of the performance of an individual school.

A composite model for monitoring the performance of a school, its subject departments and, most importantly, its pupils will be presented. The first stage of this model is to compare the results of an assessment of the prior learning of each pupil (in this case through NFER Cognitive Ability Tests - CATs) taken at the point of entry into the school, with the results of the previous year's assessment in GCSE examinations to draw conclusions about the performance and, therefore, progress of pupils, according to ability. The analysis can be extended to include the results of GCSE assessment over a number of years and so any trends in progress over time can be seen.

It will be argued that such statistical analysis, when combined with the technique of multilevelling (a concept which will be described later) is sufficiently reliable to enable a school to make valid statements about the performance of its pupils in GCSE examinations. A school can also make valid statements about the progress of pupils compared to national and regional standards and so can make judgements about the "value added" (or otherwise) by the school to its pupils - albeit in terms of "value" specifically related to pupil outcomes.

The process of analysis can be informed by an additional statistical method of analysis of examination performance, referred to by Birnbaum (1994) as the "non-league table method" (which will be described later). Such a model enables a school to assess the relative performance of subject departments in GCSE examinations in relation to the average GCSE performance of pupils within the school. Such an analysis enables a school to monitor the performance of each subject department, and even each teacher, using a model which enables realistic comparisons to be drawn.

Methodological considerations will be addressed throughout the article and more fundamental questions of the implications for teaching and learning brought about by the trend in educational evaluation towards a statistical and mechanistic model of educational evaluation will be addressed at the end of the article.

Considerations of the performance of pupils in relation to their gender, social background and ethnicity have been included, although the research presented in this article suggests that such considerations must remain specific to the particular school environment and its pupil intake.

The scope of research in this article has been confined to analysis based on GCSE and CAT scores. It is possible to use Key Stage 2 or Key Stage 3 SATs results in a similar way to the CATs, using the test results as the baseline from which to measure progress. However, in the case of the schools referred to in this article there is distinct unease amongst staff concerning the reliability of SATs results. In one school evidence of inconsistency has twice led the school to return the tests to the examination board for remarking. Each time there were substantial changes to the original marks. In the second school included in this article staff were also concerned that the results were not reliable and highlighted the 'bunching' of Key Stage 3 marks around level 5 and the poor correlation between test results and teacher assessments.

The examination analysis presented does not extend to a consideration of A' level performance of students, although the principles outlined could be applied to GCSE performance and A' level monitoring and target setting in the same way.

The data from two schools has been used in this article. School A is a small urban 11-18 comprehensive of just over 500 pupils. The overall level of achievement is below national standards. The school serves a catchment of two (predominantly ex-council hosing) estates, the majority of parents are of the Registrar-General's social classes IV and V (semi-skilled and unskilled occupations) and many are of low educational background. 10% of the school's intake is of Italian (Sicilian) descent, many Sicilians having emigrated to this part of the UK in the 1940s and 1950s. All of the Sicilian students used in the examination analysis are bilingual, with the Sicilian dialect of Italian being the dominant home language. School B is on the outer edge of a ribbon development and serves a semi-rural catchment. The school has about 1,000 pupils who represent a fairly even spread of social classes. The overall level of achievement in school B is in line with national standards.

Using Data in Educational Research and School Improvement

It is a principle of educational research, and statistical research is no exception, that data must be accurate, reliable and valid and must demonstrate its 'fitness for purpose'. Such a principle is also of considerable importance in any case where the results of that research may have a substantial impact on the practice of education. However, the requirements for accuracy, reliability, validity and fitness for purpose present problems that suggest that, at least, those with responsibility for implementing value added procedures in schools should advance with caution.

This article takes as its starting point the assumption that 'value added' data is now an established feature of the school improvement agenda. Secondary schools in Britain are now data rich institutions with a considerable variety of statistical data available to school managers. Such data includes Key Stage Two assessment data; the results of baseline assessments; Key Stage Three assessment data and results for GCSE and A' level examinations. In addition, most schools have the results of interim and school-based assessments and other methods of measuring the performance of pupils agreed within those institutions. Statistical analyses are also supplied to schools from a variety of sources. Each secondary school receives a Performance and Assessment report (PANDA) annually which provides a residual analysis for each examination subject, comparisons of the school to other schools of similar size and social intake, and value added information in the form of a progress score. In its Autumn Packages the Pupil Performance Unit of the Department for Education and Employment (DfEE) supplies schools with over 100 pages of explanation and example to enable schools to carry out statistical analysis to make judgements about the progress of pupils.

This article addresses a number of issues related to the need for accuracy, validity and reliability. Concerns are raised, for example, about the accuracy of data about pupils eligible for a free school meal as a measure of social class and evidence is presented which raises concerns about the quality of external assessments in the case of SATs marking. This article also attempts to establish a methodology for analysing data which is consistent and which does not unfairly measure the progress of any particular group.

However, a much wider concern, which requires further research, relates to the requirement of value added data to demonstrate 'fitness for purpose'. It was noted whilst carrying out the literature review for this article that the overwhelming bulk of the literature was uncritical in its adherence to the statistical model for measuring value added suggested by adherents to the school effectiveness movement. Thomas and Mortimore (1996), for example, argue that school-based value added analysis is a good guide to the effectiveness of a school and Gray et al (1990) develop this idea to suggest the ways in which value added analysis can bring about enhanced effectiveness. They refer specifically to:

  • Enabling school progress to be understood within the context of social background;
  • Enabling comparisons to be made between a given school and other similar schools;
  • Exploring the effectiveness of the school with different groups of pupils;
  • Identifying factors under the school's control which contribute to the school's effectiveness.

Whilst this article presents a model for statistical analysis within schools which largely follows the recommendations made from the school effectiveness movement, it is also acknowledged that the statistical methodology recommended by the effectiveness movement requires further debate. The effectiveness research does not, for example, refer to the ironies implied in the development of statistical models for measuring school effectiveness. The development of statistical techniques which demonstrate the potential of pupils of particular backgrounds may lead to a reduction of practical suggestions for school improvements, particularly where pupils are fulfilling their statistically generated expectations. One further possibility for consideration is that the process of target setting which is linked to school-based value added analysis, if effective, could, in time, reduce the predictive power of baseline assessments, as schools may focus on pupils, or groups of pupils, least likely to achieve particular future grades.

In addition, there is very little empirical evidence in the literature which demonstrates the link between the use of value added measures and improved school effectiveness and this is another area requiring further research. Saunders and Rudd (1999) point out that whilst researchers in the field of value added have rightly been concerned with the methodological accuracy of the statistical model for measuring value added, there has been no research producing supporting empirical data which demonstrates how schools use the value added data available to them. Saunders and Rudd go on to pose questions as to whether the Headteacher and the staff of schools actually understand the data they are presented with. They also suggest that such data might be used in unjustified, and even unethical ways, supporting their view with anecdotal evidence of the use of the 'military metaphor' by school managers when describing the use of value added data as part of a programme of performance management. The data can be used as 'ammunition' or a 'weapon' to 'put a bomb under so-and-so'.

Evidence is presented by Wilkeley (1998) which develops the concern that the existence of a complex methodology for presenting value added statistics has had little impact on practice in the classroom. It is suggested that staff in 'data rich' schools have little confidence in 'effectiveness findings' and feel that the initiative is one imposed on them by senior staff. Wilkeley goes on to suggest that staff frequently mistrust the statistical basis of value added data and argue that its use is divisive. Paradoxically, it is also suggested that staff do not use value added data because it tells them nothing new.

This article is concerned to develop an accurate, valid and reliable model for understanding the progress of pupils within a school. In doing so specific concerns about the use of such a model as part of a programme of performance management are raised. Wider concerns about the impact of statistical models for measuring school effectiveness have been hinted at in this discussion of methodology, but are not developed. It is suggested that such concerns are areas requiring more detailed research.

Monitoring Progress: Research Findings

Research into school effectiveness, although not offering a consensus as to what constitutes an effective school, does reflect a common agreement that a vital element of the evaluation of the work of a particular school is the explicit focus on pupil outcomes and so the 'value added' offered by a school. This focus on pupil outcomes must be a qualified one, for to focus simply on the exam performance of one school and to compare that performance with the performance of another school does not provide a reliable comparison of the two schools. The requirement is that performance analysis investigates the achievement of pupils against expectations based on the characteristics of pupils when they arrive at the school. Such an approach allows for an analysis of the 'value added' to the education of its pupils by the school. Mortimore (1991) defines this as a situation where a school's pupils make more progress than might be expected from a consideration of its intake. He describes 'value added' in the case of the effective school:

An effective school thus adds value to its students' outcomes in comparison with other schools serving similar intakes. By contrast, in an ineffective school students make less progress than expected given their characteristics at intake.

Research from Gray (1996), Hart (1996) and Thomas and Mortimore (1996) concludes that the most reliable method for examining the progress of pupils within a school is to contrast the performance of pupils at GCSE with a measure of prior attainment taken at the point of entry to the school. The recommended method of assessing prior attainment is through some form of cognitive abilities test. In the case of Hart (1996) the NFER's AH2 test is recommended. This test employs three standardised sub-tests of numerical, verbal and perceptual ability which provides an overall cognitive ability within a range from 60 to 150. More recently NFER Cognitive Ability Tests (CATs) have become widespread. The sub-tests cover the same areas of reasoning and the standardised results cover a range from 70 to 130, with 100 set as the average score for each age group. (In the case of the schools referred to in this essay the assessment of prior attainment has been through the CATs.) After GCSE results have been received a school can compare the GCSE results of each pupil (through turning the grades achieved into numerical scores) with the results of the CATs. This is usually carried out through plotting results of all pupils on a scatter-graph. A 'line of best fit' or ideally a regression line is drawn and where there is a clear correlation between CAT scores and GCSE results this indicates that the assessment of prior attainment is a reliable indicator of future examination performance. The Society of Education Officers (Brunt et al, 1996) suggests that a coefficient of correlation (r) of 0.77 is a very high correlation, with Gray (1996) suggesting that a correlation of r = 0.7 is a good enough measure for predicting future GCSE results. Such an analysis also enables the school to measure the progress made by its pupils against progress made by pupils both locally and nationally and so the 'value added' can be quantified.

The use of cognitive abilities testing as a baseline to indicate the progress of pupils within a school is a relatively recent innovation. However, research from as early as the 1960s has explored means of examining the progress made by pupils within schools through an approach known as multilevel modelling. This research, much of which was pioneered by Goldstein (for example, Goldstein 1987, Goldstein et al, 1993, Goldstein and Sammons, 1995), has examined a wide variety of pupil background factors in order to identify possible relationships between the characteristics of a school's pupil intake and subsequent examination performance. Initially such multilevelling, as carried out by American sociologists Coleman and Jencks in the 1960s, concluded that pupils' achievement could be so closely correlated to their family background to suggest that schools make very little difference in the achievements of their pupils. But subsequent research (for example Rutter et al, 1979) has demonstrated that, whilst the background of a school's intake is important, schools do, in fact, have a significant effect on the progress that pupils make. The purpose of multilevelling is to demonstrate where schools make progress with groups of pupils based on background factors. So, a school could, for example, analyse the results of pupils according to social class in order to estimate the likely future results of pupils of the same class. If, in subsequent years, pupils of a similar background score better grades in public examinations it can be concluded that the school is adding value to the education of its pupils.

Multilevelling is often referred to as 'dis-aggregation' as it involves separating the results and progress scores of different groups of pupils to enable comparisons to be made. Multilevelling ( or dis-aggregation) can include a wide variety of background factors; Sammons et al (1994) suggest the following list; pupils' personal characteristics (age, sex); family structure (family size, lone parent status); socio-economic grouping (parental unemployment, low income, car ownership, social class, housing); educational (parents' educational qualifications, parents' school leaving age); ethnicity (ethnic group and level of fluency in English) and 'others' (including social mobility, population density, school roll, school type, number of pupils with special educational needs). Clearly multilevelling cannot be an exact science, as common sense alone tells us that social class or ethnicity need not be a predictor of a child's future educational performance. Nevertheless, Cohen et al (2000) report a proliferation of research which both explores the value and recommends the use of multilevelling in schools. Gray (1996), for example, in his commentary on value added approaches, suggests that a model combining prior attainment measures with school context factors will provide the most secure approach for monitoring progress within a school.

Monitoring Progress: A Possible Model

The conclusion which can be drawn from this brief overview of research is that the use of prior achievement data as a means of monitoring the progress of pupils from point of entry to the time of GCSE examinations within a school is likely to provide the strongest and most reliable correlation. However, even if a school has a correlation of r = 0.7, this means that a significant number of pupils have achieved results which do not conform to that correlation. This significance can be examined through calculating the square of the co-efficient of correlation, or r2. The result of this calculation will indicate how much the two variables have in common. If a school has a correlation of r = 0.7, then r2 = 0.49. This result can be represented as a percentage, in this case r2 = 0.49 will become 49%, which suggests the proportion of results which conform to a similar pattern of progress. Multilevel analysis can be used to provide an explanation of the results which have a less secure relationship with the results of prior attainment tests.

The analysis of correlation when combined with multilevelling may provide useful data for a school to make conclusions about the performance of groups of pupils. However, it should also be borne in mind that such an analysis does not provide secure enough information to enable a school to set academic targets for individual pupils based on the past performance of previous pupils. Whilst a general approach to individual target setting is suggested in this article it is recommended that any school leader attempting to establish such a system should proceed with caution.

The process of analysis is quite straightforward:

The analysis provided below demonstrates that target setting of this nature is a complex statistical task combining both the regression analysis and multilevelling techniques. Additionally, such an enterprise, with its potential to be used for accountability purposes and the performance management of teachers, requires careful consideration and discussion within the school's decision making process.

Monitoring Progress: Examples from School A and School B

Table 1 below presents the CAT scores plotted against average GCSE scores for the pupils from School A. A 'line of best fit' or regression line has been included to show the average progress, or 'value added', made by pupils at the school. An average GCSE score is used rather than a total GCSE score which is widely recommended in research. (See Hart, 1996; Brunt et al, 1996; Ofsted, 1996.) The use of a total GCSE score is not likely to provide an accurate comparison between pupils as they do not always sit the same number of GCSEs. In School A the number of GCSEs taken varied from four to nine, with almost half of the pupils taking eight. In School B the range varied from one to ten GCSEs taken with about two-thirds of pupils taking nine GCSEs. Where this method of examining school performance is used to make comparisons between schools it makes sense to use an average GCSE score, as opposed to a total GCSE score.

Table 1

An immediate analysis of the scatter graph suggests that there is a high correlation between CAT scores and GCSE performance as the results show some bunching along the regression line. It can be seen that the majority of pupils have made a similar amount of progress. The scores of seven pupils fall furthest beneath the regression line, these have made least progress - or may have underachieved. It is worth noting that these pupils are spread across the whole ability range. Several pupils have scored above the regression line; these seem to be clustered around the middle of the ability range. Multilevelling will enable the school to make more specific statements about the progress of pupils.

Table 2 presents the CAT scores, GCSE scores and the regression analysis for School B. In this school a similar level of correlation is in evidence. There is a fairly even spread of pupils whose results fall below the regression line. This suggests that there has been a similar level of under performance across the whole ability range. The pupils who have made more than average progress fall into two groups, those with a CAT score of over 100 and those with a CAT score of between 80 and 90. More significantly, there are fewer pupils with a CAT score of between 90 and 100 making a high level of progress. This is likely to be important for the school, as such pupils are likely to be those scoring close to five grade C GCSEs - the benchmark required for a pupil's results to figure in the school's league table position. Again, multilevelling will enable a more focused analysis of the performance of different groups in the school.

Table 2

The strong correlation between CAT scores and GCSE scores in both schools is apparent by the calculation of r2. The coefficient of correlation (r) for School A is 0.71, close to the 'very high' correlation of 0.77 as suggested by the Society of Education Officers. Such a correlation gives a figure of 0.51 for r2, which suggests that a little over half the results in School A conform closely to a similar pattern of progress. Such a correlation requires further investigation and it is at this point where multilevelling proves to be a useful tool. It may be the case that pupils falling furthest from the regression line fall into groups which multilevelling might identify as groups which do not fit in with the norm. The analysis could indicate a particular group, for example lower ability boys, where progress has been consistently poor and could lead to further analysis and action within the school to attempt to prevent further underachievement from such groups. Multilevelling may also demonstrate the under-performance of groups which may not be represented in a subsequent year group. An example of such a group might be school refusers. It is then useful to delete the progress scores of those pupils from the school analysis, leaving a clearer regression line for constructing targets for future year groups.

Multilevelling techniques can be employed in two distinct ways; one method is to produce the school performance data and then to isolate pupils within the scattergraph according to various background factors, for example economic background or gender, as will be outlined below. A second method of using multilevelling techniques is to isolate those pupils whose performances fall furthest from the regression line and then to consider the results of those pupils according to the various background factors suggested earlier. It is through such a technique that differences according to ethnicity in School A become most apparent.

Much research attention is paid to the issue of social background, with poverty frequently equated with poor performance. (For example, Paterson and Goldstein, 1991; Goldstein et al, 1993.) The simplest measure of poverty, readily available to the school, is information on pupils taking free school meals (FSM). In the case of School A (See Table 3) no pattern of under-performance amongst pupils from poorer backgrounds is apparent. In fact the appearance of pupils receiving free school meals is spread fairly evenly across both the whole range of ability and the range of GCSE scores gained. However, it is not so simple as to suggest that such a result implies that poor social background is not an influential factor in the performance of pupils. In the case of School A a high proportion of pupils receive free schools meals (16%). This might suggest that the school is accustomed to teaching pupils from deprived backgrounds and so it is likely that discrepancies according to social background will be less acute than in other schools. Additionally, when it is borne in mind that the bulk of pupils live in very similar financial circumstances, it might be the case that the school caters for a very homogeneous client group - at least in terms of wealth. In the case of School A economic background information does not seem to assist the school in analysing pupil performance. In contrast, information on the pupils taking free school meals in School B is more pointed. School B caters for a much more socially diverse range of pupils than School A and it can be seen (Table 4) that pupils from poorer backgrounds are represented almost entirely in the lower ability range. The progress made by pupils taking free school meals in School B lies very close to the school's norm, suggesting that they make similar levels of progress to other pupils. However, there may be a need for the school to reflect on some of the social aspects of its curriculum arrangements, as the lowest ability teaching group is likely to contain a high proportion of pupils from economically poor home environments.

Table 3

Table 4

Senior managers in both schools expressed some concerns about the validity of using information on pupils taking free school meals. Benchmarking data for measuring the progress made by schools in England and Wales is based on the proportion of a school's pupils 'known to be eligible for free school meals' (DfEE, 1998). However, the figure collected by the DfEE is based on those pupils who actually receive a free school meal. Staff in both schools suggested that there were several pupils who were thought to be eligible for free school meals, but who did not take up that entitlement. It was suggested that this may be an issue of pride for some parents but, more realistically, it seemed to reflect the social pressures on Year 11 pupils. In both schools Year 11 pupils were not required to stay on the school site during the lunchtime and, as a consequence, very few Year 11 pupils remained in school to take their lunch. Most preferred to visit the neighbourhood shops with their classmates.

Gender is also a common factor on which multilevelling techniques focus. In the case of School A there has been considerable concern from parents and governors during recent years at the apparent poor performance of girls as, in spite of national trends, boys at School A make up the majority of the highest performing pupils. In this case (see Table 5) the results of boys and girls have been dis-aggregated to demonstrate the progress made by each group of pupils. The table provides useful information for the school as it demonstrates that the bulk of the girls make more progress than the boys. However, there is a skew in the ability of the girls, with there being more girls in the lower ability range than boys. This may contribute to the concern from parents and governors if conclusions are drawn based on GCSE results alone and not progress. It is interesting to note that the more able boys have made more progress than have the girls with similar CAT scores. It is not clear why this is the case and, once again presents an issue which leaders in the school should investigate.

Table 5

The analysis of the performance of girls and boys in School B (Table 6) presents a picture much more consistent with national trends. With the exception of there being slightly more boys than girls in the very lowest ability group, the spread of girls and boys across the ability range is fairly even. The consideration of GCSE scores and the addition of regression lines for both boys and girls shows that, across the entire ability range, girls are making more progress than the boys. This additional level of progress is in the order of half a grade for each GCSE taken. The task for staff is to investigate both cultural and pedagogical factors which may be contributing to such a discrepancy.

Table 6

A second method with which to analyse the scatter graph of GCSE results against CAT scores is to begin by isolating those pupils whose scores fall furthest from the regression line and then to examine whether there are any factors which are common to those pupils. If the pupils whose results fall furthest from the norm make up specific groups, such evidence might prompt further investigation and action within the school.

The results of fourteen pupils in School A were scrutinised; these were the fourteen whose results, when plotted on the scatter graph, fell furthest from the regression line. Seven pupils fell furthest below the line and seven furthest above.

In the case of the seven pupils who scored furthest above the regression line four are Italian (three girls). There is some significance here; there are seven Italian girls included in the data, and all produced results falling above the regression line. Of the three remaining scores highlighted all represent pupils who were hard working, but who do not represent any significant group. Of the seven pupils whose results fell furthest below the regression line, two are Italian boys; two were school refusers; one is a pupil who had been temporarily excluded from school for fifteen days during the final term of his Year 11 and who had been in considerable trouble with the police; one is a pupil with special educational needs who scored a CAT score of 76 after receiving much help in the original test - a score of 70 would have been more accurate - and, finally, a hard working girl who, despite considerable effort and revision, scored disappointing grades in her GCSEs.

Second to the assessment of prior attainment, the most significant factor which appears to affect GCSE performance in School A is ethnicity. More specifically pupils of Italian background represent the only significant group which scored according to a different pattern from the rest of the school. (Fourteen of the 80 pupils included in the data are of non-British descent; one girl is Bangladeshi, two boys are Spanish and eleven pupils are Italian.) The Spanish and Bangladeshi pupils achieved GCSE results which lie close to the regression line and although home language may be an influential factor (only one of the three speaks English at home) the sample is too small to draw any significant conclusions. The eleven Italian pupils make up fourteen per cent of the examination cohort and so provide a large enough sample to draw some tentative conclusions. It is interesting to note that nine of the eleven scored GCSE grades which put them above the regression line. In contrast, however, the very poor performance of two Italian boys is difficult to explain. Several different factors may be at work here. As all Italian pupils speak Sicilian as their first and home language it is possible that the cognitive abilities test does not provide a reliable measure of prior attainment in the case of pupils using English as a second language. It is probable that CATs are not norm referenced for pupils whose process of language acquisition differs from the bulk of the population. Socio-cultural factors may also affect progress; in the case of Italian pupils at School A senior staff reported that home life is traditionally Sicilian and that the Sicilian community is very close-knit. Italian pupils do not seem to experience the social pressures experienced by many of their non-Italian school-mates. However, no clear explanation can account for the two boys whose results indicate less progress; certainly the social pressures experienced by the Italian boys are very different from those experienced by the girls, but if there were any uniform effect on academic progress we could expect to see some pattern in the progress of all the boys and this is not the case.

Table 7

A final piece of analysis which the school can carry out whilst comparing CAT scores to GCSE scores is a residual gain analysis. FitzGibbon (1997) suggests such an analysis as a method for calculating a school's 'value added' score compared to national standards. FitzGibbon describes an analysis which can be used to demonstrate the difference between a statistically predicted performance and the actual performance. In the two following tables the regression line for each school is plotted against the national regression line, to demonstrate the extent to which the school is achieving added value with its pupils. It can be seen that pupils in School A have scored fractionally better than national standards, whilst in School B pupils have scored a little over half a grade above national figures. It is also of note that this added value in School B is consistent across the whole ability range.

Table 8

Table 9

Monitoring progress by subject department

An additional model, suggested by Birnbaum (1994) presents a method for measuring the value added within a school according to individual subject departments. Such a method, which Birnbaum calls 'the non-league table method' provides an estimate of the relative value added by subject department and so proves useful in comparing the progress which pupils make in different subjects. The estimate is calculated through comparing for all pupils the GCSE performance in each subject taken with their average GCSE performances. This comparison is made in statistical terms so, for example, if a pupil has scored a GCSE average of a D (or 4 points on a scale where a G grade scores 1 point, an F 2 points, and so on to an A* which scores 8 points) the performance in each GCSE can be compared to that D. If a C is scored in mathematics, the mathematics value added is + 1 (or one grade better than the average) and if an E grade is scored in English, the English value added will be - 1.

The table below presents the GCSE scores for three fictitious pupils taking four subjects each.

Table 10

   

English

Maths

Science

R.S.

Average

 
 

Brown

4

4

5

2

3.75

 
 

Jones

6

7

7

6

6.5

 
 

Smith

5

4

4

5

4.5

 

 

The average score for each student can then be subtracted from the score achieved in each subject to give a 'value added' score for the performance of each pupil in the subjects taken.

Table 11

   

English

Maths

Science

R.S.

Average

 
 

Brown

0.25

0.25

1.25

-1.75

3.75

 
 

Jones

-0.5

0.5

0.5

-0.5

6.5

 
 

Smith

0.5

-0.5

-0.5

0.5

4.5

 

It is then possible to calculate the combined value added for all pupils in each subject to give an average value added score for each subject which can be compared to the school's average. The results can be calculated graphically. Table 12 provides a comparison of the progress made by pupils in each of the subjects in School B. Where subjects score above the line the bar indicates that the average performance in those subjects was better than the school's average progress figure. So, for example, the score of 1.25 in drama (Dr) indicates that on average pupils score over a grade higher than their average GCSE performance. In other words, pupils have made a grade more progress in drama. Similarly, in geography (Gg) pupils have scored over half a grade below their average GCSE.

Table 12

The production of such a table is useful for, just as multilevelling techniques are used to investigate differential performance of pupils, so the table encourages the investigation of differential results by subject area. Clearly the table provides evidence that pupils at School B have fared better in some subjects than in others and it is the task of the school's staff to explore reasons for those differences. The table, then, can be used as a means of comparing GCSE results by department, and even by teacher, and so, in turn, can form the basis of both monitoring outcomes within departments and establishing a basis for accountability by department. But such procedures must be approached carefully; the relatively poor outcomes within some subject departments cannot simply be explained away as the results of poor teaching. It is the task of the school to examine closely the reasons for differential performances; it may, for example, be the case that some departments are affected by limited curriculum time, poor resources, a new syllabus, changed teachers, a mis-match between teaching styles and the learning preferences of pupils, falling against a popular option and so on.

Using data analysis for target setting - some issues for consideration

In May 1995 the Secretary of State for Education asked the Chief Inspector of Schools to "look at target-setting in primary and secondary schools to identify good practice and determine what further action is needed to help schools develop effective target-setting strategies" (Ofsted, 1996, p.2). This led to the Ofsted/DfEE publication, in 1996, of Setting Targets to Raise Standards: A Survey of Good Practice. The report, which describes target setting as "taking action by setting specific goals and targets designed to raise educational standards", goes on to argue "the effective use of targets, especially quantitative targets, may help schools to articulate clearly what is expected of, for example, each pupil, class or group or indeed of the school as a whole". (Ofsted, 1996, p.5.) The report clearly recommends a model for target setting which involves the analysis of past examination performance within the school in order to predict the potential performance of pupils and so to focus effort and resources on pupils who are underachieving or being insufficiently challenged.

The confidence with which the target setting approach based on prior attainment data is recommended by Ofsted is echoed in the research cited in this article, where prior attainment data is seen as the most valuable predictor of future GCSE potential. Any dispute within the research concerns the use of pupil background indicators. Gray (1996), for example, suggests that a multilevel model provides the most secure approach for predicting future GCSE results - although a measure of prior attainment must provide the cornerstone of such a model. Thomas and Mortimore (1996) argue that where prior attainment data are available no school context factors are significant in predicting pupil outcomes.

The evidence presented in this article suggests that a school should set outcome targets for whole year groups using a composite model which combines a statement of prior attainment with an allowance made for background factors of some pupils. Within such a model all pupils would take CATs and from this a GCSE prediction for each pupil can be made on the basis of the regression line produced from an analysis of GCSE scores of past years compared to CAT scores. Allowances would need to be made if the school has larger numbers of pupils representing any group which, in previous years, have performed according to a different pattern. Italian pupils in the case of School A would be an example. The school would also need to bear in mind the national regression line, as an under-performing school, should aim to achieve national levels of value added.

The evidence also suggests that a school should proceed with caution if it attempts to set targets for individual pupils based on a previous regression analysis. Whilst the analysis of progress can provide a broad indication of the general performance of pupils in a school, it cannot be translated into specific statements about the progress which a pupil should make in each GCSE taken. At best, such a method of analysis can provide a rough guideline as to the average performance a school could expect of an individual pupil if that pupil were to make progress in line with the average of pupils in previous years. To make even more specific statements about future performance in individual subjects is likely to be unreliable.

The statistical basis of correlation analysis demonstrates why it is not possible for a school to make confident predictions about the progress of individual pupils based purely on a correlation analysis. If a school were to achieve the very high correlation (r) of 0.77, as recommended by the Society of Education Officers (Brunt et al, 1996), in its comparison of CAT scores and GCSE results, this would only provide a figure for r2 of 0.59. Such a figure indicates that only 59% of pupils scored average GCSE scores which conformed to the pattern of progress of previous pupils within the school. In other words, a correlation as high as 0.77 will only provide significant information on the past performance of 59% of pupils, leaving the performance of 41% of pupils requiring some alternative method of explanation.

Staff in both schools have asked whether it is possible to make GCSE predictions based on CATs, as the CATs are little more than norm referenced IQ tests and so should not, in theory at least, provide a foundation from which to predict criterion referenced GCSE results. In short, teachers do not feel the practice compares like with like.

As the statistical correlation between CATs and subsequent GCSE results points to a fairly strong correlation between the two tests it is necessary to provide empirical evidence which demonstrates why there is such a close correlation. Table 13 provides such evidence; the graph demonstrates the distribution of CAT scores, which follows the traditional bell-shaped distribution curve of norm referenced tests. The graph also demonstrates that, despite claims of criterion referencing, the distribution of GCSE results follows a very similar pattern to the distribution of the norm referenced CAT scores. When it is borne in mind that the GCSE distribution curve covers only some 90% of the cohort of 16 year-olds (many of the least academically able youngsters are not entered for examinations), whilst CATs are norm referenced for the whole population, the correlation becomes more obvious. This suggests that GCSE results are distributed according to the bell-shaped distribution of norm-referencing and are not criterion referenced and, as GCSEs were initially established as criterion referenced examinations which offered a fairer mode of assessment than norm referencing, the evidence does little to boost the teacher's confidence in the examinations system.

Table 13

Concern may also be raised regarding the use of a mechanistic and statistical model for target setting and accountability in schools. Such concern has been echoed in research into the technicist and quantitative method of Ofsted inspection. Chris Woodhead, the Chief Inspector of Schools, in his 1997 annual lecture, however, demonstrated very little concern for the classroom teacher who may feel that such a method of evaluation fails to recognise holistic and humanistic elements of education. The school leader seeking to introduce a statistical model of evaluation and target setting in a school needs to demonstrate a great deal more tact and diplomacy than is required of an Ofsted inspector. It must be recognised that, for many teachers, innovations in education are a cause for suspicion, for they can threaten fundamental beliefs about the nature and purpose of education. On a more practical level, a teacher might feel threatened by such a method of performance management and accountability, if the teacher does not have confidence that the manager responsible for the evaluation will seek a holistic explanation as to why GCSE results do not conform to targets which have been set. The programme outlined in this article certainly does provide a useful accountability tool which can indicate cases where a teacher is in need of some professional support, but before school leaders seek to isolate under-performing teachers, it is vital that such examination analysis includes consideration of all possible factors, curricular and otherwise, which might have affected GCSE results.

References

Birnbaum, L. (1994) "The non-league table method", Education 184 (1) pp 12-13.

Brunt, M., Davis, P., Ilett, K., Kelly, A., McShane, C. and Platts, G. (1996) Value Added and School Improvement, Society of Education Officers' Value Added working party.

Cohen, L., Manion, L. and Morrison, K. (2000) Research Methods in Education (5th Edition), London, RoutledgeFalmer.

Department for Education and Employment (DfEE) (1998) 1997 Benchmark Information for Key Stages 3 and 4, London, Qualifications and Curriculum Authority (QCA).

Goldstein, H. (1987) Multilevel Models in Educational and Social Research, London, Charles Griffin and co.

Goldstein, H., Rasbah, J., Yang, M., Woodhouse, G., Pan, H., Nuttall, D. and Thomas, S. (1993) "A multilevel analysis of school examination results". Oxford Review of Education, 19, (4): pp. 425-433.

Goldstein, H., and Sammons, P. (1995) The Influence of Secondary and Junior Schools on Sixteen Year Examination Performance: A cross-classified multilevel analysis, London: University of London Institute of Education.

Gray, J. (1996) "Comments on value-added approaches", Research Papers in Education, 11 (1) pp 3-4.

Hart, C. (1996) "Target setting an monitoring progress in the secondary school", British Journal of Curriculum and Assessment, 7 (1) pp 26-32.

Mortimore, P. (1991) "The nature and findings of school effectiveness research in the primary sector" in Riddel, S. and Brown. (Eds) School Effectiveness Research: Its Messages for School Improvement, London, HMSO.

Paterson, L. and Goldstein, H. (1991) "New statistical methods of analysing social structures: an introduction to multilevel models, British Education Research Journal, 17 (4) pp 387-393.

Radner, H. (1995) Evaluation of Key Stage Three Assessment Arrangements, School Curriculum and Assessment Authority (SCAA), London, HMSO.

Rutter, M., Maughan, B., Mortimore, P. and Ouston, J (1979) Fifteen Thousand Hours: Secondary schools and their effects on children, London, Paul Chapman Publishing Ltd.

Sammons, P., Thomas, S., Mortimore, P, Owen, C., Pennell, H. and Hillman, J. (1994) Assessing School Effectiveness: developing measures to put school performance in context, London, Ofsted.

Saunders, L and Rudd, P. (1999) Schools' use of 'value added' data: a science in the service of an art? A paper presented at the British Educational Research Association Conference, University of Sussex, at Brighton, 2-5 September, 1999.

Thomas, S. and Mortimore, P. (1996) "Comparison of value-added models for secondary school effectiveness", Research Papers in Education, 11 (1): pp 5-33.

Wilkeley, F. (1998) 'Dissemination of research as a tool for school improvement?', School Leadership and Management, 18, 1, 59-73.

Woodhead, C. (1997) Do we have the schools we deserve? Annual lecture by Her Majesty's Chief Inspector of Schools, London, HMSO.

This document was added to the Education-line database on 06 December 2000