Learning from international comparisons
Introduction
Improving the quality of their provision is a priority for all education systems, and while ideas from other systems cannot be guaranteed to work elsewhere, it would be foolish to claim that we cannot learn anything from other education systems, especially those that perform relatively well on international comparisons such as PISA, TIMSS, and PIRLS.
The challenge, of course, is that to learn from other systems, we have to determine which features of those systems contribute to the success or failure of those systems and which are merely idiosyncratic features that have little influence, one way or the other, on the system’s performance. In other words, we need to determine which features of those systems have a causal influence on the system’s performance, and which are merely associations.
In the social sciences, it is frequently argued that the only way to create reliable knowledge about causal influences is through the use of randomized-control trials (RCTs), often termed “the gold standard” for research. Such claims miss two important points. The first is that RCTs tell us only that any differences between the experimental and control groups in that particular experiment are unlikely to be due to chance. To determine whether the same findings apply elsewhere requires careful theorization about whether features in the setting of the experiment are likely to hold elsewhere.
A good example of this is the Tennessee STAR study on class-size, in which Kindergarten students were randomly assigned to classes of 22 to 26, to classes of 13 to 17, or to classes of 22 to 26 with both a teacher and a teacher’s aide. The results suggested that small classes were more effective for at-risk students, and that these benefits were maintained through to the end of high school. However, whether such results would generalize to other elementary schools in Tennessee, let alone elementary schools in other states is open to question due to a feature of the experiment’s design.
To be included in the experiment, elementary schools had to have at least 57 Kindergarten students, so that the three different kinds of groups could be formed. The schools included in the experiment therefore tended to be larger than average for the state, and more likely to be in urban areas (Wiliam, 2019).
The second problem with relying on RCTs as the only way of producing reliable evidence about causal effects is that you end up not being able to say very much. There are many areas, such as the effects of smoking on health, where it would be impossible (or at least highly unethical) to even attempt to conduct an RCT.
Recognizing this, in 1965, Austin Bradford Hill proposed that, in the absence of evidence from RCTs, if there were clear evidence of an association between two variables, there were nine aspects of that association that should be considered in order to determine whether the relationship between the variables was causal in nature (Hill, 1965). Although his concern was with the medical sciences, I believe that these nine aspects of association provide a useful framework for looking at international comparisons, and in particular, for determining which features of an educational system might have a causal impact on the relative educational performance of that system. These nine aspects are discussed in turn below.
1. Strength
If a particular feature of an education system is a primary cause of success as measured by international comparisons, then the relationship should be strong, in that systems where that feature is strong should perform much better than those where the feature is weak. For example, if selectivity into teacher education is suggested as a cause of success, then those jurisdictions where entry into teacher education is highly selective should be much more successful than those in which entry into teacher education is relatively unselective.
2. Consistency
To be regarded as a causal influence on the relative success of a system’s educational performance, a feature must be consistently associated with success. In other words, it should be the case that where the feature is strong, the system is highly likely to be successful, and where the feature is weak or absent, then the system is unlikely to be high performing, at least as measured by international comparisons.
3. Specificity
If a particular feature of an education system is characteristic of, and only of, high performing systems, then this lends support to the conclusion that the feature is a cause of that high performance. This aspect can be regarded as a special case of consistency, where the feature takes on just two values: present and absent.
4. Temporality
One, perhaps obvious, requirement is that to be regarded as having a causal influence, the cause must precede the effect, but in education, it is also necessary to take into account the time-lag in education systems. Changes in the availability of pre-school would be unlikely to affect the results on tests taken by 15-year-olds for at least a decade, although some of the relationships are often difficult to disentangle.
For example, in the Second International Mathematics and Science Study (SIMSS) there was a strong association between achievement of elementary school students in mathematics and whether mathematics was compulsory for high-school students beyond the age of 16. At first sight this seems impossible. How can the amount of mathematics students will be required to study at the age of 17 influence their achievement in the fourth grade? The answer, of course, is that elementary school teachers in those countries where mathematics is compulsory up to the age of 18 are likely to have greater mathematics subject knowledge than those in countries where students can discontinue the study of mathematics at age 16.
5. Biological gradient
As well as the consistency of the relationship discussed above, a causal relationship should also show what in medicine is called a strong dose-response relationship. For example, if we suspect, continuing our earlier example, that a high degree of selectivity in recruitment to teacher education causes higher achievement, then we would expect that the more selective a system is, the greater the relative performance of that system. Systems that have ten or more applicants for every place on a teacher preparation program should be more successful than those with only two or three applicants for each place, who in turn should be more successful than those who are recruiting rather than selecting prospective teachers.
6. Plausibility
To conclude that a feature of an education system is a cause of its relative performance, it must be at least plausible that the feature should somehow influence student achievement. That said, Austin Bradford Hill himself cautioned against placing too much weight on this factor since what is plausible at any time is determined by the available scientific knowledge. For example, twenty years ago, it might have seemed unlikely that the amount of practice testing conducted in schools would have a significant impact on student achievement, although recent research suggests that this may be one of the most cost-effective strategies to implement (Dunlosky, Rawson, Marsh, Nathan, & Willingham, 2013).
7. Coherence
While a relationship between a feature of an education system and its relative performance may not be well supported by existing evidence, the case for a causal relationship would be undermined if it too clearly contradicted a large body of existing evidence. Extraordinary claims require extraordinary evidence.
8. Experimental evidence
While evidence from RCTs may not be available to show that a particular feature of an education system at least in part determines its relative performance, there may be studies that show that particular policy changes are followed by particular changes in system performance.
9. Analogy
As a final consideration, it might be appropriate to consider analogies as providing support to a causal interpretation of an observed association.
References
Dunlosky, J., Rawson, K. A., Marsh, E. J., Nathan, M. J., & Willingham, D. T. (2013). Improving students’ learning with effective learning techniques: Promising directions from cognitive and educational psychology. Psychological Science in the Public Interest, 14(1), 4-58. doi:10.1177/1529100612453266
Hill, A. B. (1965). The environment and disease: Association or causation? Proceedings of the Royal Society of Medicine, 58(5), 295-300.
Wiliam, D. (2019). Some reflections on the role of evidence in improving education. Educational Research and Evaluation, 25, 127-139. doi:10.1080/13803611.2019.1617993
