Educational evaluation and policy-making: exploring the relationship in two Swedish examples
Paper presented at the European Conference on Educational Research, University of
Hamburg, 17-20 September 2003
(Education Policy and Education Research: Comparative Analyses of a Tense Relationship symposium)
This paper sheds light on the relationship between policy-making and implementation on the one hand, and a particular kind of educational research on the other hand, i.e. educational evaluation. It is argued that the vast increase of evaluative activities that New Public Management have brought into public education in Europe, has made researchers/evaluators even more part of policy-making than before. Researchers are often engaged as evaluators in evaluations at all levels in education systems. Since the premises for evaluation commonly are more restricted compared to other kinds of research, few evaluators/researchers challenge the direction of education policy already decided upon. On the contrary, in most evaluations and systems of evaluation, the policy direction is taken for granted. Two examples from Sweden will illustrate processes in which researchers/evaluators become directly involved in policy-making and implementation. From these examples, it is concluded that political accountability becomes dispersed and boarders between those who are to be considered politically accountable and those who are not are blurred. Another conclusion drawn is that educational researchers may want to consider thoroughly whether or not they want to take on evaluation commissions or be peer-reviewers, and also consider under what conditions such assignments are acceptable.
Purpose of the paper
The purpose of this paper is to explore some aspects of the relationship between educational evaluation and national policy-making. This will be done by situating evaluation in a context of national governance, and thereafter discuss two examples of national evaluations from Sweden: One from the public school and one from higher education. These examples will be presented further down in the paper.
This paper concentrates on national policy-making, and the concept policy-making is here understood as public action programmes. Such action programmes includes not only formal decisions, but also chains of decisions or chains of actions that become apparent when action programmes are implemented in administration and practice (Premfors, 1989). This means that national policy-making incorporates the components of the policy-process as it is traditionally conceived of, i.e. agenda-setting, problem analysis, analysis of policy-alternatives, decision-making, implementation, evaluation and feed-back (Ham and Hill, 1993; Hogwood and Gunn, 1984). It also means that policy-making and governance are intertwined. I like to stress that neither policy-processes nor evaluation processes are viewed as linear, chronological processes. Even though different phases can be discerned analytically in both types of processes, they are rather to be understood as intertwined, sometimes occurring simultaneously, and in different orders.
Within the field of evaluation, many theorists take evaluation processes to be separate from policy-making. Therefore evaluators are not generally thought of as policy-makers. Nor is evaluation usually perceived of as policy-making. But if policy-making contains all phases of a policy-process, evaluation is also part of policy-making. It is still an empirical question what phases of a policy-process that is most connected to evaluation. From research in the field of evaluation, even though it is scarce, it can be concluded that evaluation influence policy-making in different ways. Earlier this kind of research was focused on how evaluation results influenced decision-making (see for example Weiss, 1991). Lately, evaluation theorists have taken an interest in how the entire evaluation process effect policy-making at large, (Kirkhart, 2000; Patton, 1998; Sahlin-Andersson, 1995; Segerholm, 2001; Stronach et. al., 2002; Stufflebeam, 2001). To mention some of the results so far from this research, it has been shown that the mere fact that an evaluation is initiated can influence policy-making so that decisions are put on hold, or that a topic is removed from the political agenda. Knowledge of an upcoming national evaluation also shifts attention at the practice level to what is asked for in evaluations (e.g. certain kinds of documentation, routines, etc.). Criteria in national evaluations have shown to be taken as checklists for improvement in educational practices.
National governance, new public management and educational evaluation
In most European countries, governance of public enterprises/sectors are nowadays characterised by what Pollitt (1995) and others have called New Public Management (NPM). Common features of NPM are market-like production of services and an emphasis of measurement of productivity, effectiveness, efficiency and quality. In the Swedish education and higher education systems these features are visible through the increasing numbers of "independent schools"1, an increase and emphasis on national evaluations and quality-assurance activities/systems, internal control, auditing, etc. In short, different kinds of accountability activities have expanded (see also Power, 1997). Education in Sweden is one of the public sectors most influenced by NPM - a new (inter) national policy on governance. Parallel to this shift in policy of public governance, public education and higher education have been decentralised, leaving local levels more autonomy to decide on how to implement national goals.
National governance is now said to be goal and result oriented. The idea is to spell out national goals at the national policy-level, to implement at local levels, and to evaluate/measure the results at local levels to control both local and national goal-fulfilment. The control activities mentioned as NPM activities thus make it possible to put pressure on local levels and work as a corrective for national policy-makers. The state standards reforms in most of the states in the U.S. share much of the same features as the Swedish education reforms of late. International comparisons between European and other countries like TIMMS and PISA may also work as corrective and disciplining instruments for both national and local levels.
Conditions for evaluations
Most NPM activities are evaluative and NPM activities occur at all levels in educational systems. Evaluators are often recruited from the research community, which means that large portions of those who undertake educational evaluations are trained within the academic field of education. Since this type of evaluative activities expand, even more educational researchers than before become involved in them. Researchers are commissioned because of their expertise, and their external position vis-ā-vis the evaluand2. By that they lend credibility to the evaluation and legitimacy to the evaluand.
A common way to set up an evaluation is by a procurement procedure. In a "terms of reference" (request for proposal), the commissioning body lay out the direction of the evaluation, that is, what kind of evaluation they want to buy. Interested evaluators make proposals that look favourable in relation to the terms of reference. This means that evaluations and many other kinds of NPM activities operate under more restricted premises compared to research. Nevertheless, evaluations are producing information and knowledge by much the same scientific methods used in educational research, with the same variations of ontological and epistemological preferences. (However, now there seems to be an orientation to more positivistic evaluation approaches like evidence-based evaluation3.) In order to obtain an evaluation contract, evaluators have to accommodate to the commissioner's request. Few evaluators dare challenge the commissioner's ideas, which also reflect the commissioner's view/understanding of the evaluand. Those evaluators who do, most probably do not get the contract. Independent of the level of policy-making, evaluators ordinarily take the policy already decided upon as their point of departure, for example in selecting the criteria for the evaluation. Conditions given by national policy-makers, like state grants and level of ambition and complexity expressed in national goals, are seldom taken into account.
The kind of conditions for evaluation that has been pictured above, leads to evaluations that take the commissioner's view for granted. Such evaluations can hardly be described as critical examinations of merit, worth or quality, or as generating information/knowledge from a critical or different perspective compared to the commissioner's (Segerholm, 2002; Stronach, Halsall and Hustler, 2002).
To illustrate some aspects of the relation between educational researchers/evaluators and national policy-making and governance, two Swedish examples will be described and discussed below.
The national evaluation of the compulsory school 20034
The national evaluation of 2003 is carried out under the responsibility of the National Agency for Education. Civil servants at the agency unit for evaluation have made the evaluation design and general outline for the evaluation. The evaluation also includes commissioning researchers from various educational fields (mainly school-subject related) to develop evaluation instruments for all school-subjects. I am one of the researchers working on the evaluation of art. As such, I have inside knowledge of the development of evaluation instruments for the national evaluation of art. I have participated in researcher conferences with researchers contracted for the same purpose but in the school-subjects of music, home-economics (hemkunskap), craft (slöjd), social science and religion. Information about other school-subjects (e.g. Swedish, English, science, etc.) has been gained from a conference site for researchers and administrators engaged in this evaluation, installed by the National Agency of Education.
At the time of writing this paper, the evaluation information is collected and will be analysed during the coming academic year, some of it by the commissioned researchers and some by civil servants at the national agency. So far there are no results from the evaluation, and even if it were, the researchers are not allowed to publish them until the agency has published their report. That report will be based on the researchers' analysis. Evaluation results and the use of them will therefore not be targeted in this paper. In the following, this evaluation effort is described as a whole, and the evaluation instruments of the school-subject art are presented in more detail. I hope to shred light on how my colleagues and I become quite involved in suggesting and implementing national policy in the guise of a national evaluation.
As previous national evaluations, this one still focus on tests of pupils in different school-subjects, measuring their level of knowledge and whether or not this compares favourably with the goals set down in the national curriculum and subject curricula. However, efforts have been made to collect information of "the process", i.e. to undertake more process-oriented evaluations. Independent of how evaluations are constructed or designed, they will have an impact on the evaluand (Patton, 1998; Kirkhart, 2000). In educational contexts, this compares to the phenomenon generally more known as "teaching to the test".
A general observation is that the evaluation design for the over-all evaluation (all school-subjects) is predominantly focused on questionnaires to pupils and teachers, with a strong comparative approach when it comes to school-subjects. There is also an effort to capture the views of pupils and teachers on more common issues related to the actions of the teachers, the general curricular goals, school-climate, self-esteem and the like. In specific questionnaires and tests, separate school-subjects are evaluated. These instruments are mostly constructed to measure pupil knowledge or particular individual traits. In some of the so-called aesthetic school-subjects, like art, the evaluation design also includes what has been labelled "process studies". All in all, the total national evaluation effort is directed toward collecting information on pupils' and teachers' views, the school context, and on subject-based performance.
Any set of questions in an evaluation can be analysed with respect to content to get an idea of the evaluation criteria. This will however not be done here (but is another important topic to be covered in studies concerning impact of evaluations in policy-making). The content of the evaluation, or criteria, can be said to foster understanding among teachers and school administrators about what is considered important in the public school today. At practice level attention and actions are (re) directed towards the criteria. In that way, evaluators act as implementers of national policy (Segerholm, 2001).
A particular process evaluation approach
What will be pinpointed in the following text is the emerging practice to emphasise "process information" in national evaluations. Now, process is a complex concept, and in educational settings it may mean different things. To study or evaluate an educational process can mean for instance to study the learning processes of individual pupils, the teaching of individual teachers, or group based activities in particular contexts. Different factors will be given attention depending on what direction and level is chosen in an evaluation. In other words, in order to know what information is needed, it is important to define what is the evaluation object and therefore the level of analysis. In the 2003-year evaluation of arts, the evaluation instruments developed for the process study indicate that it is the pupils' learning process that is of interest. The same can be said about evaluations of crafts and home-economics. In the art evaluation, the main information is collected through one type of portfolio method based on work of Lindström et al. (1999). Each pupil collects work and documents of his/her work process in accordance with preordained dimensions (set by the evaluators) in a portfolio. The teachers collect the portfolios and assess the students. The portfolio method is an assessment method that integrates learning and assessment. As such, it emphasises the autonomous learner, the child that can discipline her-/himself to document the steps in a learning process in a particular way. This instrument penetrates far into the teaching and learning activities of the individual pupils and teachers. If this instrument is considered along with the rest of the evaluation instruments and the kind of questions (and thereby criteria) asked, the evaluation is not only directed to the academic results of the pupils, but also directed to the way they and their teachers structure and understand the learning processes, i.e. to the detailed form of educational practice. Thereby, this national evaluation not only identifies what kind of knowledge is worth knowing, but also in what way it is to be gained. The national evaluation interacts directly with the learning- and teaching-process of individual pupils and teachers. By that the evaluator is not only an instrument of official national policy in setting the academic criteria. The evaluator is also an instrument of national policy-making in that he/she constructs meaning, of how the pupils and teachers are to understand and assess themselves and the world around them. This meaning-making is sanctioned by the national policy-makers (represented by the civil servants at the national agency). In this evaluation, the evaluators are active national policy-makers both as implementers and as meaning-makers.
The national system for quality assessment in higher education5
This example is about a system for quality assessment that includes several evaluative activities or evaluations. In 2001, the National Agency for Higher Education was assigned the task to perform evaluations of academic subjects and programmes in higher education with a frequency of six years. Units that reveal an unsatisfactory quality in those examinations risk losing their right to award grades and certificates. If that happens, they also loose state funds.
As an outsider in this effort, I have relied on information from official materials produced by the National Agency for Higher Education: quality assessment evaluation manuals, policy-materials on this approach, and reports from the national agency on different evaluated disciplines and programmes. Inside-information from one of the responsible civil servants and information from informal interviews with faculty at departments that have been assessed have also been used. The assessment process will be described below with the aim to provide some insight in how policy is shaped and implemented by the help of our selves, as colleagues and as self-evaluators, directed by the national policy-makers, represented by the civil servants at the national agency.
The object of the assessment system/evaluation is subjects and programmes. The assessment follows a three-stage process. First, the department or subject provider under review carries out a self-evaluation based on a manual provided by the National Agency for Higher Education. This manual and its evaluation criteria is said to "...have been formulated in collaboration with the institutions of higher education." (National Agency for Higher Education, 2001, p. 19). While the self-evaluation is ongoing, a peer-group is appointed by the agency, consisting of experts recommended by the institutions involved. Important to the discussion in this paper is that the expert group often includes people familiar with the discipline, that is, national and international colleagues. During an on-site visit, this group of external assessors ("peers") reviews the self-evaluation and discusses it with those concerned. They are also provided with an evaluation manual by the national agency. National conditions, like funding, national policy for higher education and research, are not a concern as conditions restricting practice in any of the manuals, but taken for granted. The procedure is finalised by the peer group summarising its impressions in a report. To this report, the conclusions and recommendations of the national agency are added. The third stage involves following up the review after an interval of 1-3 years.
Because the Swedish evaluation model involves all higher education institutions and educational programs, it indirectly produces a basis for comparisons. This is neither a common feature nor the initial purpose of the self-evaluation/peer-group model. In the national assessment, all institutions providing a specific program are made visible. The peer-groups do not rank the departments, but the institutions will inevitably, while reading the reports, make comparisons and rank and discipline themselves.
Since the national evaluation also involves reviewing the institutions' rights to award degrees and since the national agency has the power to issue sanctions on the basis of the evaluation results - deliver "warnings"- the comparative element is further emphasized. Each institution's right to award degrees is tried against more or less pronounced and uniform national standards. In this situation the control motive comes out strongly. It can be questioned whether the potential for a critical self-examination, where trust in the professionals and development is at the fore, can be maintained when such a model as the one described above is institutionalized, mandatory and directed by actors at the national level.
Even though civil servants at the agency direct the evaluations, both experts and those evaluated are part of the execution of the assessments. They are part in developing the criteria, and they take part in the procedure. By that they are also part of the implementation of national policy as defined by the evaluation criteria. Although there has been some public debate about this assessment system, critical articles are scarce (I have detected less than five). It is not that easy to be critical when the entire effort has been presented, and by many within the universities is viewed as something that is aimed at improving the subjects. Who is not interested in such a good cause? There is also some flexibility in how to use the manuals, leaving some space to the individual departments for manoeuvring. (However, dissimilarities are later reduced when the civil servants at the agency process the material.) But overall, the national policy and preconditions are accepted in the process of evaluation, even though the national conditions for carrying out undergraduate and graduate training, and research have changed dramatically over the last decade.
In the first example, the impact of evaluation clearly is much more profound with this kind of process-oriented approach, compared to what can be achieved by a pure result/product-oriented evaluation approach. Although this conclusion may be visible for people interested in studies and research on evaluation, it is not always easyly detected by the teachers or evaluators. Evaluations has a history of being viewed as objective, un-political and detached, leaving planning and decision-making, etc. at the national level to politicians, and decisions at the service level to the professionals. Evaluation is not commonly understood as yet another way to govern and make policy. For the professionals, the type of governance and interaction with policy-making by evaluation that is sketched in this example is therefore not visible and talked about in terms of governance or policy-making. Neither is it talked about or understood in terms of political in that it shapes educational practice and pupils' and teachers' general conceptions of how learning takes place, how it is to be promoted and assessed. The Education Act and other national requirements (the national curriculum and subject grade criteria) are the national governing instruments teachers identify. Moreover, in the decentralized governing of today's school-system, it is said that the professionals (teachers) are the ones who primarily decide the forms of the educational processes together with the pupils. I have strong doubts that such professional autonomy really is at hand, and if teachers and pupils are not instead governed through instruments and channels that they not yet recognise. It also seems as if evaluators become even more integrated in national governing and policy-making in our decentralized school-system than previously, when process-evaluation approaches are used that interact directly with the teaching and learning process of individual pupils and teachers.
In the second example it is not only the national actors who act as evaluators. The civil servants at the agency, the experts and those evaluated are all carrying out their part of the procedure. Thereby they unintentionally or not are part of the implementation of the present national policy on higher education and research, carried out with the best of intentions. In this system, governance is decentralised to self-governance/-disciplining at practice level. Everyone is eager to improve, to develop and maintain the subject status, albeit mainly defined by the national policy-makers. By comparisons with other departments, and how they are assessed compared to the criteria, they conform and correct their practice.
Evaluators and civil servants make policy on teaching and learning when they decide how an evaluation is designed, the questions asked and the methods used for collecting (process) information. Peer-reviewers and faculty (and students) make policy in that they actively, without protest or criticism, conduct self-evaluations and peer reviews and thus help implement national policy. But can these groups formally be made accountable for this kind of policy-making? No formal decisions by any political assembly has been taken concerning the preferred teaching and learning methods, the preferred evaluation criteria for higher education, or the other aspects in evaluations that have been described in this paper. General laws and "letters of regulations" direct the work of the schools, universities and national agencies. Thus, national civil servants, evaluators/educational researchers, and all other people engaged in national evaluations and assessment/quality activities interpret national general policy-decisions made by politicians into more sturdy policy for the practice level.
When NPM activities increase, educational researchers become more involved in policy-making than before, but they are never formally accountable as policy-makers (neither are the civil servants at the national agencies). But in the practice of policy-making and governance, political accountability becomes more dispersed, and boarders between those who are to be considered politically accountable and those who are not, are blurred. The responsibility is shared on particular premises (contracts) by many groups and individuals. These premises do not include a critical debate about the general national policy or goals for public schooling and higher education. In this relation between evaluation and policy-making, the lack of input from ordinary citizens outside these two public sectors also has to be considered. After all, they have an interest in the goals and purposes of education and in how their tax money is used.
Finally, in a time when the increase of NPM means that evaluators/researchers become more deeply involved in (national) policy-making as described and discussed above, there is a need to pose a few questions to ourselves: Do we as evaluators/researchers recognize that we are potential national governing tools? Is it our wish to take on this political responsibility even though no one can make us accountable for our part in these processes? Do we as evaluators recognize our power to shape pupils' and teachers' views of themselves as learners in evaluation? Under what conditions can we honourably take on evaluation commissions?
Hopefully it is possible to persuade commissioners like national civil servants that evaluators need some autonomy, that several perspectives and interests need to be represented, that several evaluations need to be carried out on the same object, and that the results by necessity will be inconclusive and need to be thoroughly debated in public. Such debates inform the citizens, the professionals and the politicians. However, we may also need to persuade national policy-makers that the governing policy need to change, that the NPM doctrine of governance need to be cut back so that evaluative activities do not overwhelm administrators at national and local levels, professionals, and in the end make the educational practice suffer so it becomes a technical activity rather than an ethical, moral and political activity (see Schwandt, 2003). There must be another way to make policy and to govern education in a democratic society.
Ham, Christopher and Michael Hill (1993). The policy process in the modern capitalist state . 2nd edition. New York: Harvester Wheatsheaf.
Hogwood, Brian W. and Lewis A. Gunn (1984). Policy analysis for the real world. Oxford: Oxford University Press.
Kirkhart, K.E. (2000). Reconceptualising evaluation use: An integrated theory of influence. In Valerie J. Caracelli and Hallie Preskill (Eds.) The expanding scope of evaluation use, New Directions for Evaluation, 88:5-23. San Francisco: Jossey-Bass.
Lindström, Lars, Leif Ulriksson and Catharina Elsner (1999). Utvärdering av skolan avseende läroplanernas ml (US98). Portföljutvärdering av elevers skapande i bild. [Evaluation of the school concerning the goals in the national curriculum (US 98). Port-folio evaluation of the pupil's creative processes in art. In Swedish.] Stockholm: Skolverket.
National Agency for Higher Education (2001). From quality audit to quality assessment. Högskoleverkets rapportserie 2001:9R. Stockholm: National Agency for Higher Education.
Patton, Michael Q. (1998). Discovering process use. Evaluation, 4 (2):225-233.
Pollitt, Christopher (1995). Justification by works or by faith? Evaluating the New Public Management. Evaluation, 1 (2 ):133-154.
Power, Michael (1997). The audit society: Rituals of verification. Oxford: Oxford University Press.
Premfors, Rune (1989). Policyanalys . [Policyanalysis. In Swedish.] Lund: Studentlitteratur.
Sahlin-Andersson, Kerstin (1995). Utvärderingars styrsignaler. [The signals of governance in evaluations. In Swedish.] In Björn Rombach and Kerstin Sahlin-Andersson (Eds.), Från sanningssökande till styrmedel. Moderna utvärderingar i offentlig sector. [From a search for truth to a governing instrument. Modern evaluations in public sectors. In Swedish.] Stockholm: Nerenius & Santérus.
Schwandt, Thomas A. (2003). Linking evaluation and education: Enlightenment and engagement. In Peder Haug and Thomas A. Schwandt (Eds.), Evaluating educational reform: Scandinavian perspectives. Greenwich, CT: Information Age Publishing, pp. 169-188.
Segerholm, Christina (2001). National Evaluations as Governing Instruments: How Do They Govern? Evaluation, 7(4):427-438.
Segerholm, Christina (2002). The Perils of Procurement in Evaluation. Contribution to the NOPSA conference in Aalborg 15-17 Augusti 2002, workshop 10 "Evaluation in the Public Sector".
Segerholm, Christina (2003). To govern in silence? An essay on the political in national evaluations of the public schools in Sweden. Studies in Educational Policy and Educational Philosophy, 2003:2. E-journal. http://www.upi.artisan.se/Pages/cgi-bin/PUB_Latest_Version.exe?allFrameset=1&pageId=3
Segerholm, Christina and Eva Åström (2002). National evaluations as decentralized governing - the institutionalisation of evaluations in the Swedish higher education system. (The friendly face of evaluations). Paper presented at the AEA-conference in Washington 6-10 November, 2002.
Stronach, Ian, Rob Halsall and Dave Hustler (1992). Future imperfect: Evaluation in dystopian times. In Katherine E. Ryan and Thomas A. Schwandt (Eds.), Exploring evaluator role and identity . Greenwich, CT: Information Age Publishing, pp. 167-192.
Stufflebeam, Daniel L. (2001). Lessons in contracting for evaluations. American Journal of Evaluation, 21 (3):293-314.
Weiss, Carol (1991). Evaluation research in the political context: Sixteen years and four administrations later. In Mildred W. McLaughlin and D.C. Phillips (Eds.), Evaluation and education: At quarter century. Chicago: National Society for the Study of Education, pp. 211-231.