|
|
Test Development for criterion-referenced ability assessment: The case of experiential reasoning
Tibor Vidákovich
Department of Education, University of Szeged, Hungary e-mail: t.vidakovich@
edpsy.u-szeged.hu
Paper presented at the Learning Communities and Assessment Cultures Conference organised by the EARLI Special Interest Group on Assessment and Evaluation, University of Northumbria, 28-30 August 2002
AIMS
This paper presents a research carried out in the framework of the experiment 'Development of critical cognitive skills', co-ordinated by József Nagy at the University of Szeged, Hungary. In the experiment, diagnostic tests were developed to evaluate and to follow the individual development of children.
Therefore, the objectives of test development were to create assessment tools that (1) can indicate the level of development and follow the progress, and (2) can be used as diagnostic assessment tools as well. The results of assessments were used as basis for creating individual developmental indices, and also as starting points in planning further steps of development.
In this paper, methods of criterion-referenced test development are introduced first. As an example, the case of experiential reasoning is presented to illustrate the methods applied in our test development process. Then the first results and experiences gained from the initial steps of our experimental work are discussed, including the questions and problems that arose in our work.
THEORETICAL BACKGROUND
The experimental research project 'Development of critical cognitive skills' was based on the results of several previous projects carried out in the seventies and eighties in Hungary. Most of these projects were co-ordinated by Nagy and his team. The results of these projects showed that there are a number of basic skills and abilities that play an important role in cognitive and personality development. One of these skills is experiential reasoning, that is, the usage of basic deductive schemes of propositional and predicative logic.
Concerning the development of basic logical operations and deductive schemes, a great body of research results is discussed in the literature. Since Piaget's works, operations and deductive schemes of propositional logic have been playing an important role in the research of logical abilities and deductive reasoning (Inhelder and Piaget, 1958). Investigations focused primarily on conditionals and deductive patterns (Wason and Johnson-Laird, 1972), though a small number of studies have been conducted on the development of the whole system of basic logical operations as well (Johnson-Laird and Byrne, 1991).
The main directions of research included the development of logical operations, propositional and predicative schemes of reasoning, typical errors when interpreting logical structures, and the role of the content and context (Evans, 1996). In the nineties, a few studies of deductive reasoning applied non-structural logical models (Overton, 1990). Other investigations focused on the context, the origins and the factors influencing the development of logical abilities (Sticht, Beeler and McDonald, 1992).
According to Piaget's developmental theory, different stages can be recognised in the development of reasoning and deductive schemes. In early childhood the implicit, experiential level of reasoning is typical, and this form of reasoning can be developed in the kindergarten and in the primary school. This form of reasoning does not include any explicit knowledge about logical schemes and rules of application. It is based only upon personal experiences and the use of the language in everyday context.
In spite of the relative simplicity of these skills, there are children who do not understand and use these logical structures and deductive patterns correctly. In the school, falling behind in the development of reasoning skills and failing to grasp and use deductive patterns can hinder students' comprehension of instruction, textbooks, and in general, their academic progress.
METHODS
The process of test development
In the development of tests for the assessment of critical cognitive skills and abilities, the main objective was that they should be appropriate for identifying children's actual developmental stage (by skill and by components) and for making comparisons between their status and the level of optimal acquisition of the skill. At the same time, tests should be appropriate for defining the next tasks of the developmental process (by skill and by components). This means that tests should be appropriate for analytic diagnosis as well.
To satisfy these requirements, criterion-referenced diagnostic tests (or test batteries) were developed. In the test development, a standard procedure was applied for each skill. The steps of this procedure are listed in Table 1.
Table 1. Steps of criterion-referenced test development
|
Step 1: Identifying the structure and components of the skill |
|
Step 2: Developing tasks for each component |
|
Step 3: Developing tests for groups of components that represent all others |
|
Step 4: Mapping the developmental process on a representative sample |
|
Step 5: Setting the criterion characterising the optimal functioning of the skill |
As Table 1 shows, the first step of test development procedure was the identification of the components and the structure of the skill. The components can be routines or other elementary operations; the structure (organisation) can be hierarchic or any other type. Basically, this preparatory work has a theoretical nature but the results should always be controlled by empirical studies as well.
As a second step, task development follows. For each component, several tasks should be developed, with different contents and/or different levels of difficulty. These task variants can be used as alternatives when assembling the test or series of tests (test battery). This phase is also theoretical in nature, but the tasks should be tested in empirical (pilot) studies.
The third step is the compilation of tests or test series. The aim of this work is to select the set (or sets) of components that can represent all other components of the skill, so that the test will be appropriate for the assessment of the skill itself and for the assessment of its main component types as well. An important requirement here is that tests should be equivalent, at least with similar means and standard deviations. This work is usually based upon the results of the previous phases.
The fourth step is the empirical study of the test or test battery. To get reliable results that exhibit 'good behaviour' from a statistical point of view, a national representative sample should be selected for all age groups that are planned to be assessed with the tests. The aim of this work is to map the main trends of the development of the skill and to establish national standards for all age groups.
Finally, the fifth step of test development is setting the criterion characterising the optimal functioning of the skill, on the basis of which individual developmental indices can be constructed. These indices show the progress of individual pupils and/or study groups, regarding the skill itself or any of its components.
Tests of experiential reasoning
In the case of experiential reasoning, the test development was based upon a task series developed by Nagy (1980). This base test was developed for individual assessment, and was standardised for 4-7-year-olds. The test included tasks for the most important deductive schemes of propositional and predicative logic. The schemes were all syllogisms, that is, they consisted of two premises and a conclusion that should be drawn from the premises. In the tasks, the conclusion was an incomplete statement, and the children had to complete it with the proper word or phrase.
For the purposes of criterion-referenced assessment and development, the test was widened and adapted. The main point in this development was that the test should cover a wider range of deductive schemes, thus offering more possibilities for analytic diagnosis. Another task was to make the test appropriate for group (classroom) assessment of pupils as well. For the sake of this, a 'paper and pencil' version of the test was created.
As regards the character of the tasks, the types of deductive schemes included and some examples of the tasks are listed in Tables 2 and 3. In both tables, the first column shows the scheme types that occurred (were assessed and developed) in the experiment. The terminology is somewhat informal, showing the 'nicknames' that were used in communication with the adult participants of the experiment (i. e. kindergarten and primary school teachers).
The test itself was very simple. It was produced in two forms, one for individual and the other for group assessment. The two versions contained the same tasks, altogether 24 tasks of the types represented in Tables 2 and 3. In the tasks, premises were formulated with familiar contents and were planned to evoke everyday situations.
Table 2. Types of propositional schemes and examples of the tasks
|
Scheme type |
Task example |
|
'step forward' |
If I get a toy, I will be happy. Now I have got a toy, therefore ... (I am happy). |
|
'step back' |
If I don't do something well, my mom will not praise me. Now my mom has praised me, therefore ... (I have done something well). |
|
'chain' |
If I fall down, I will get dirty; and if I get dirty, I will be scolded. Therefore, if I fall down, ... (I will be scolded). |
|
'choice' |
We usually play or sing something. But now we are not singing, therefore ... (we are playing something). |
Table 3. Types of predicative schemes and examples of the tasks
|
Scheme type |
Task example |
|
'step forward' |
Children are not adults yet. Steevie is a child, therefore ... (he is not an adult yet). |
|
'step back' |
If we don't like a pet, we don't stroke it. We stroke the cat, therefore ... (we like it). |
|
'chain' |
What is interesting, we are curious about; and what we are curious about, we would like to know. Therefore, what is interesting, ... (we would like to know). |
|
'general conclusion' |
Sally is not touchy. Therefore, there are girls who ... (are not touchy). |
|
'concrete conclusion' |
Sparrows can fly. Therefore, there is a sparrow that ... (can fly). |
The method of the assessment was almost the same in the individual and in the group assessment. In the first case, kindergarten or primary school teachers read the task (the premises), and children had to complete the last sentence (the conclusion). In the other, paper and pencil form pupils had to read the task and then to complete the last sentence.
The evaluation of the answers was also very simple. In all tasks, children could get a score of 1 if their answer was correct, and a score of 0 if not. Answers with a slightly different formulation but with the same meaning were accepted as correct answers.
In the experiment, kindergarten and 1st and 2nd grade pupils were assessed with the individual method, 3rd to 6th graders with the paper and pencil method. The important question, whether children perform better or worse on the paper and pencil test, was to be investigated in the research.
Samples
The research project 'Development of critical cognitive skills' was launched in autumn 1999, with the participation of 6 kindergartens and 4 primary schools from Szeged and from its surroundings. In the first year more than 800 children participated in the experiment from the middle and final years of kindergartens (4-6-year-olds) and from the 1-6th grades of primary schools (6-12-year-olds).
The aims of this preparatory period were to create assessment tools and developmental tasks (and to prepare schools and teachers, of course) for the experiment. The assessment with the experiential reasoning test described earlier was carried out at the end of the academic year, in spring 2000. The sample consisted of 950 children from the grades mentioned above and from the 8th grades of the same schools. Each sub-sample consisted of 70-150 children. These samples were not representative, but this pilot study was suitable for the purposes of test development.
RESULTS
Previous research of Nagy (1980) has yielded national indices of the spontaneous development of the experiential reasoning of 4 to 7-year-olds. The aims of the present experiment are to update these standards and to extend them for the age groups of 4 to 12-year-olds. Another area of the planned research is the diagnostic assessment of experiential reasoning, that is, different types of propositional and predicative deduction. To meet these requirements, a national representative sample will be necessary. From the pilot study, approximate information can be gained that might help further test development.
On the basis of the results, the experiential reasoning test proved to be a reliable instrument for the whole sample, but its reliability (Cronbach-a ) was much better for kindergarten and 1st to 3rd grade sub-samples. From grade 4, the reliability dropped and for 8th graders it was only about 0.6. The reason for this is probably the relatively high achievement and low standard deviation of these groups (see later).
The development of experiential reasoning
The main tendencies of the developmental changes of means and standard deviations can be seen in Figure 1. The improvement in the means of total scores was significant (p < 0.05) between middle and final years of kindergarten, and the latter group differed significantly also from grade 1. However, between grades 1 and 2 the difference of means was not significant. The values of standard deviation were higher in the kindergarten and in grade 1, but in grade 2 a significant decrease was observed.
In grades 3-8, when the group assessment method was applied, the development proved to be slow in general. A significant (p < 0.05) improvement of means was found only between grades 4 and 5. The curve can be regarded as a continuation of the curve of the previous period. Standard deviation values for grades 3 and 4 were higher than for the previous grades, then they showed a decreasing tendency again.
These results show that our method is appropriate for the assessment of experiential reasoning in the kindergarten and in the first years of primary school, but from the 4th grade its reliability is not satisfactory. In the development of experiential reasoning, significant development takes place mostly before or in the first years of primary schooling. After this period, changes are much slower.

Figure 1. The development of experiential reasoning (mean ± std. dev.)
Results by tasks
The task-by-task analysis of the results (e. g. analysing the development of deductive schemes) shows that most tasks follow the general tendency of development of experiential reasoning (Figure 2). There are only a few tasks that don't conform to this general developmental tendency. The curves of these tasks are labelled in Figure 2.

Figure 2. The results of experiential reasoning tasks (means)
Which are these irregular tasks, deductive schemes? Except for K19, the others are all predicative tasks. Some of them were mentioned earlier (K23: 'step back' - 'If we don't like a pet, ...'; K24: 'chain' - 'What is interesting, ...'). K21 is a 'concrete conclusion', K22 is a 'step back' type predicative task. Children's achievements on these tasks are very low in the kindergarten, and curves run below the others in the primary school period as well. In the case of item K24, the reason of falling back in grades 3-5 might be some difficulty in reading comprehension.
K19 is a 'step back' type propositional scheme - 'If I drop the glass, it will break. Therefore, if the glass is not broken, ... (I have not dropped it).' It is similar to the example listed in Table 2, the results of which are fairly good. In spite of this, the results of this task are relatively poor in the kindergarten, but then they show intensive improvement. From the 4th grade on they are among the best ones.
Regarding that most problematic items are predicative tasks, the conclusion might be that predicative schemes are more difficult for children. But there are significant differences within the propositional and predicative groups as well, therefore a detailed analysis of scheme types might reveal the developmental tendencies more precisely.
Results by scheme types
Results of deductive scheme types are represented in Figure 3 (propositional patterns) and Figure 4 (predicative patterns). As these Figures show, the results of propositional schemes are higher, as it might have been hypothesised on the basis of the pervious task-by-task analysis. The difference is significant (p < 0.05) for all scheme types, except for the 'step forward' type.

Figure 3. The results of propositional tasks (means)
The diagnostic analysis of the results by task types (by deductive schemes) makes it possible to identify the problems of the development of experiential reasoning, and to determine the further steps of the development. Because of the reliability problems of the test and the small number of items in some task groups, the following analysis has a limited validity, and can provide only rough approximations.
In general, the most developed schemes are 'step forward' types. The results of these schemes start from 70% and exceed 90% in primary school groups, independently from their propositinal or predicative characters. As the level of optimal functioning was set at 90% in our experiment, these results allow us to draw the conclusion that 'step forward' type deductive schemes are at the level of optimal acquisition in the school, and there are only a few children who have not reached this level yet.

Figure 4. The results of predicative tasks (means)
The group of 'step back' type schemes seems to be much more problematic. Both in the propositional and in the predicative group, the curve is at about 40% in the kindergarten. This is the worst result among the deductive schemes. In spite of the similar beginning, the tendencies of the development are different. Propositional 'step back' schemes show a fast improvement and from grade 4 on they are at the level of 'step forward' schemes, while predicative 'step back' schemes develop at much slower pace and their results remain among the worst ones throughout primary school. This indicates that, for most pupils, 'step back' type predicative schemes are problematic and their development must be facilitated in this respect.
The third scheme type that can be the basis of a comparison between propositional and predicative deduction is the 'chain' type. Development in 'chain' type tasks seems to follow similar tendencies in the case of propositional and predicative schemes. The results of 'chains' showed an increasing tendency in the kindergarten and in grades 1 and 2, then a decrease took place in grades 3 and 4. These tasks were formulated with relatively difficult sentences, therefore this change might be caused by difficulties of reading comprehension. In grade 5, the achievement levels return to that of 2nd graders, but further changes can be characterised as stagnation, rather than as improvement. As means are usually under 90%, these results show the necessity of continuous developmental efforts.
The development of other deduction types, like 'choice' in the propositional group, 'general conclusion' and 'concrete conclusion' in the predicative group can be characterised in the same way. 'Choice' schemes show the best results among propositional types, with higher results than those of 'step forward' types. The period of intensive improvement takes place in the kindergarten and in grades 1 and 2, and in the primary school means are already over 90%. It seems that only a few pupils need development here.
The development of 'general conclusion' follows similar tendencies, but means are lower, even when the achievements reach their maximum. As the curve shows, the means of this task type don't change significantly during the school years. The other predicative scheme, 'concrete conclusion' develops like 'step back'. Its results are among the worst ones in the kindergarten and in the primary school as well.
The problem with these scheme types (and with predicative tasks in general) might be related to the usage of quantifiers (all, each, there are, there is, etc.) and to the conceptual background necessary for the proper use of these words. Although these structures play an important role in learning mathematics as well, their development is not satisfactory, with achievements below 90% even at the end of primary education. The tasks for the teacher are clear: developmental efforts have to be continued all through the years of schooling.
DISCUSSION
Summarising the results of the first empirical studies, our test proved to be an appropriate tool for revealing developmental tendencies of experiential reasoning. It was easily applicable for both individual and group assessment. The reliability of the test was good for the kindergarten and grade 1-3 sub-samples, but it was weaker for the older ones. A possible reason is the higher general achievement, with a smaller standard deviation. When applying in a classroom context, the test will surely satisfy the needs of creating individual developmental indices.
The test was applicable also for diagnostic purposes. Diagnostic analyses by tasks and by scheme types revealed the typical problems in the development of experiential reasoning. Our samples were not representative, and the number of items for some scheme types were not high enough, but the method described above seems to be suitable for planning further steps of ability development and for defining individual developmental strategies for the pupils.
REFERENCES
Evans, J. St.B. T. (ed, 1996): Thinking and reasoning. Psychology Press - Erlbaum Taylor and Francis, Hove, UK.
Inhelder, B. and Piaget, J. (1958): The growth of logical thinking from childhood to adolescence. Basic Books, New York.
Johnson-Laird, P. N. and Byrne, R. M. J. (1991): Deduction. Lawrence Erlbaum Associates, Hove and London, UK.
Nagy József (1980): 5-6 éves gyermekeink iskolakészültsége [School readiness of 5-6-year-old children]. Akadémiai Kiadó, Budapest.
Overton, W. F. (ed, 1990): Reasoning, necessity, and logic: Developmental perspectives. Lawrence Erlbaum Associates, Hillsdale, NJ.
Sticht, Th. G., Beeler, M. J. and McDonald, B. A. (eds, 1992): The intergenerational transfer of cognitive skills. Ablex Publishing Co., Norwood, NJ.
Vidákovich, T. (2001): The development of basic logical operations: the role of school-related and socio-cultural factors. Paper presented at the 9th European Conference for Research on Learning and Instruction, Fribourg, Switzerland.
Wason, P. C. and Johnson-Laird, P. N. (1972): Psychology of reasoning: Structure and content. Harvard University Press, Cambridge, MA.