In recent times a great deal has been made of the importance of using RCTs in education in order to build a body of evidence about which practices are effective. This is very laudable, but as others have pointed out, there are many implementation issues with RCTs. My concern here is not so much with the implementation, as with their usefulness – or possible lack thereof. While the statistical power of a randomised controlled trial potentially provides greater confidence in the outcome, interpretation of the data needs to be approached cautiously.

 

Take, for example, a trial using a randomised control group. The experimental group receives a reading intervention and the other does not. The mean of the intervention group improves more than the control group. How much does this tell us? Who gained the most benefit? Can we be confident that the intervention alone was responsible for the raised average? Did most students move a little, or did a few move a lot. Overall, it can be argued, on an average basis the intervention is more effective than no intervention. But for how many of the students? What were their characteristics? A good discussion in the research paper would attempt to identify the reasons why the intervention worked for some students more than others, but unless the experiment is set up carefully this information may be hard, if not impossible, to extract. In fact, to make a meaningful interpretation, unless the impact was very uniform, we would get close to having to compare individual results in order to make sense of the data.

There is a much more efficient, and often more illuminating, way of tracking the impact of an intervention.  It also allows us to see quickly if results can be replicated. It is known as the single-subject design. It is well suited to the needs of teachers in the classroom, enables them to identify useful information that they can immediately employ in their teaching with particular students, and allows for individual variations while enabling the results of multiple students to be compared.

In the single-subject design, a baseline of current student performance is taken (the A phase). After this information is gathered, the intervention is introduced (the B phase). In the B phase, the impact on the student’s performance is tracked. When the B phase has been completed, we can conclude the experiment (also called a ‘case study’ design). But at this stage it will not really prove that the intervention was the reason for any change in the student’s performance. To confirm that, we need to go back to the baseline conditions (the A phase again) and after a suitable period we then reinstate the intervention (the B phase). If the same impact on performance occurs again, we have replication. This replication suggests that the intervention is the factor that is making the difference. And if we compare the results of individual students, and see a similar pattern of replication, our inference is strengthened. This design is often referred to as ABAB.

The famous example above was published way back in 1968, and shows how the mathematics errors and corrects of a student (‘Bob’) rose and fell depending on which his teacher gave the most attention to. At first (the A phase) Bob was making lots of digit reversals in his addition, and the teacher spent time patiently correcting each one with him. In the B phase she paid no attention to his errors but did give time to commend him on his corrects. His errors rapidly reduced and his corrects increased proportionately. Then the researchers moved back to the A phase. When the teacher gave more attention to Bob’s errors, the errors increased. Repeating the B phase conditions produced a replication of the same pattern as before – Bob’s errors decreased when his teacher gave them no attention.  (Note to avoid any confusion: the study did not suggest that teacher attention caused the errors, but that Bob was making errors in order to get the teacher’s attention. You can read the full study here).

This article by John O. Cooper (unfortunately behind a paywall!) explains how behavioural methodologies including the variations of single-subject design were developed, and how these developments made the approach increasingly useful for teachers in the classroom. There is also a short outline of the increasing usefulness of single-subject design in this comprehensive text, Research Methods in Education, by Cohen, Manion and Morrison (p.284).

The charts of students in Thinking Reading programmes look more like case study designs. Because they need to catch up to their peers as quickly as possible, we haven’t been reverting to baseline to see if progress stalls. This format is called ‘multiple baseline’, and does not require the subject to return to baseline because replications with other subjects will confirm the relationship between the intervention and the student’s performance.  The repeated replications provide strong evidence that the reading programme is responsible for the improvements in reading scores. You can see the charts on the Case Studies page on the Thinking Reading website.