“It’s time we changed – converting effect sizes to months of learning is seriously flawed”

Anyone with any kind of passing interest in evidence-informed practice in schools will be aware that effect sizes are often used to report on the effects of educational interventions, programmes and policies. These results are then summarised in meta-analyses and meta-meta analyses and are often then translated into more “understandable” units, such as years or months of learning. Accordingly, John Hattie writes about an effect size of 0.4 SD being equivalent to a year’s worth of learning. Elsewhere the Education Endowment Foundation in their Teaching and Learning Toolkit have developed a table which converts different effect sizes with months of additional progress being made by pupils. For example, an effect size of 0.44SD is deemed to be worth an additional five months of learning, or an effect of size of 0.96SD representing 12 months additional learning.

However, this approach of converting effect sizes into periods of time of learning would appear to be seriously flawed. In an article recently published in Educational Research – Matthew Baird and John Pane conclude

Although converting standardized effects sizes in educations to years (or months, weeks or days) of learning has a potential advantage of easy interpretations, it comes with many serious limitations that can lead to unreasonable results, misinterpretations or even cherry picking from among implementation variants that can produce substantially inconsistent results. We recommend avoiding this translation in all cases, and that consumers of research results look with scepticism towards research translated into units of times. P227 (Baird and Pane 2019)

Instead, Baird and Pane argue that when trying to convert standardised effect sizes – which by their very nature are measured on an abstract scale – the best way in which to judge where a programme/intervention effect is meaningful is to look at what would have been the impact on the median student in the control group, if they had received the treatment/intervention. For example, assuming a normal distribution in both the intervention and control groups, the median pupil in the control group – let’s say the 13th ranked pupil in a group of 25 – if they had received the treatment and the standardised effect size was 0.4 SD the pupil would now be ranked 9th in the control group.

Well what are the implications of this for anyone working with and in schools and who are interested in evidence-informed school improvement?

• Baird and Pane’s analysis does not mean that the work of Hattie or the Education Endowment Foundation is invalid and no longer helpful. Rather it means we should be extremely careful about any claims about interventions providing benefits in terms of months or years of additional progress.

• There are additional problems with the “converting effect sizes to months of learning” approach. For example, the rate of progress of pupils’ achievement varies throughout school and across subjects (see https://onlinelibrary.wiley.com/doi/full/10.1111/j.1750-8606.2008.00061.x) and the translation doesn’t make sense for non-cognitive measures (eg, of pupils’ well-being or motivation).

• There’s an interesting balancing act to be had. On the one hand, given their knowledge and understanding of research teachers and school leaders are going to have to rely on trusted sources to help them make the most of research evidence in bringing about school improvement. On the other hand, no matter how ‘big the name’ they may well have got something wrong, so at all times some form of professional scepticism is required.

• Effect sizes and whether they can be reliably converted into some kind of more interpretable metric may be neither here nor there. What matters is whether there is a causal relationship between intervention X and outcome Y and what are the support factors necessary for that causal relationship to work, (Kvernbekk 2015).

• Given the importance that teachers and school leaders give to sources of evidence other than research – say from colleagues and other schools – when making decisions, then we probably need to spend more time helping teachers and school leaders engage in critical yet constructive appraisal of the practical reasoning of colleagues.

• Any of us involved in trying to support the use of evidence in bringing about school improvement may need to be a little more honest with our colleagues. Well if not a little more honest, maybe we need to show them a little more professional respect. Let’s no longer try and turn the complex process of education into overly simplistic measures of learning just because those same measures are easy to communicate and interpret. Let’s be upfront with colleagues and say – this stuff is not simple, is not easy, and there are no off-the shelf answers, and when using research it’s going to take extremely hard work to make a real difference to pupils’ learning – and you know what – it’ll probably not be that easy to measure

And finally

It’s worth remembering no matter what precautions you take when trying to convert an effect size into something more understandable, this does not take away any of the problems associated with effect sizes in themselves. See (Simpson 2018) for an extended discussion of these issues.

References

Baird, Matthew D, and John F Pane. 2019. “Translating Standardized Effects of Education Programs Into More Interpretable Metrics.” Educational Researcher 48(4): 217–28. https://doi.org/10.3102/0013189X19848729.

Hattie, J. A. 2008. Visible Learning. London: Routledge

Higgins, S., Katsipataki, M., Coleman, R., Henderson, P., Major, L. and Coe, R. (2015). The Sutton Trust-Education Endowment Foundation Teaching and Learning Toolkit. London. Education Endownment Foundation.

Kvernbekk, Tone. 2015. Evidence-Based Practice in Education: Functions of Evidence and Causal Presuppositions. Routledge.

Simpson, Adrian. 2018. “Princesses Are Bigger than Elephants: Effect Size as a Category Error in Evidence‐based Education.” British Educational Research Journal 44(5): 897–913.

Research shows that academic research has a relatively small impact on teachers’ decision-making – well what a surprise that is!

Recent research undertaken by Walker, Nelson, Bradshaw with Brown (2019) has found that academic research has a relatively small impact on teachers’ decision-making, with teachers more likely to draw ideas and support from their own experiences (60 per cent) or the experiences of other teachers/schools (42 per cent). Walker et al go onto note that this finding is consistent with previous research and go onto argue that these findings suggests that those with an interest in supporting research-informed practice in schools should consider working with and through schools, and those that support them, to explore their potential for brokering research knowledge for other schools and teachers.

In many ways we should not be surprised by these findings as similar findings about research use have been found in ethnographic research in UK general practice (Gabbay and le May, 2004) , where the findings show that clinicians very rarely accessed research findings and other sources of formal knowledge but instead preferred to rely on ‘mindlines’ – which they defined as ‘collectively reinforced, internalised, tacit guidelines. These were informed by brief reading but mainly by their own and their colleagues’ experience, their interactions with each other and with opinion leaders, patients, and pharmaceutical representatives, and other sources of largely tacit knowledge that built on their early training and their own and their colleagues' experience.

Now in this short blog I cannot do full justice to the concept of ‘mindlines’. Nevertheless if you would like to find out more I suggest that you have a look at (Gabbay and Le May, 2011; Gabbay and le May, 2016). That said, for the rest of this blog I’m going to draw on the work of (Wieringa and Greenhalgh, 2015) who conducted a systematic review on mindlines and draw out some of mindlines key characteristics

• Mindlines are consistent with the notion that knowledge is not a set of external facts waiting to be ‘translated’ or ‘disseminated’ but instead knowledge is fluid and multi-directional – and constantly being recreated in different settings by different people, and on an on-going basis.

• Mindlines involve a shared reality but not necessarily homogenous reality – made up of multiple individual and temporary realities of clinicians, researchers, guideline makers and patients

• Mindlines incorporate tacit knowledge and knowledge in practice in context

• Mindlines involves the construction of knowledge through social processes – this takes the form of discussions influenced by cultural and historic forces – and are validated through a process of ‘reality’ pushing back in a local context

• Mindlines are consistent with the view anyone including patients are capable of creating valid knowledge and can be experts in consultations

• Mindlines may not be manageable through direct interventions however they maybe self-organising as the best solution for a particular problem in a defined situation is sought out

What are the implications of ‘mindlines’ for those interested in brokering research knowledge in schools?

Wieringa and Greenhalgh go onto make of observations about the implications for practitioners, academics and policymakers if they embrace the mindlines paradigm, and which are equally applicable to schools.

1. We need to examine how to go about integrating various sources of knowledge and whether convincing information leads to improved decision-makings

2. In doing so, we need to think more widely about counts as evidence in school – and how these different types of evidence can best used by teachers and school leaders in the decision-making process

3. We need to examine how mindlines are created and validated by teachers, school leaders and other school stakeholders and how they subsequently develop over time.

In other words, research which focusses on how researchers can better ‘translate’ or ‘disseminate’ research is unlikely to have much impact on the guidelines teachers and school leaders use to make decisions.

And finally

I’d just like to make a couple of observations about the report by Walker et al (2019). First, I don’t like reports where reference is made to the ‘majority of respondents’ and no supporting percentage figure is given. Second, the terms climate and culture seem to be used interchangeably although they refer to quite different things. Third, significant differences between groups of teachers are highlighted yet no supporting data is provided or explanation of what is meant in this context by significant.

References

Gabbay, J. and le May, A. (2004) ‘Evidence based guidelines or collectively constructed “mindlines?” Ethnographic study of knowledge management in primary care’, Bmj. British Medical Journal Publishing Group, 329(7473), p. 1013.

Gabbay, J. and le May, A. (2016) ‘Mindlines: making sense of evidence in practice.’, The British journal of general practice : the journal of the Royal College of General Practitioners. British Journal of General Practice, 66(649), pp. 402–3. doi: 10.3399/bjgp16X686221.

Gabbay, J. and Le May, A. (2011) Organisational innovation in health services: lessons from the NHS Treatment Centres. Policy Press.

Walker, M., Nelson, J, Bradshaw, S. with Brown, C (2019) Teachers’ engagements with research; what do we know? A research briefing, London: Education Endowment Foundation,

Wieringa, S. and Greenhalgh, T. (2015) ‘10 years of mindlines: a systematic review and commentary’, Implementation Science. BioMed Central, 10(1), p. 45.