There has been much talk recently about the scientific process in the light of recent claims of fraud against a number of psychologists (http://bit.ly/R8ruMg), and also the failure of researchers to replicate some controversial findings by Darryl Bem purportedly showing effects reminiscent of pre-cognition (http://bit.ly/xVmmOv). This has led to calls for replication to be the cornerstone of good science – basically “an effect is not an effect until it’s replicated” (http://bit.ly/UtE1hb). But is replication enough? Is it possible to still replicate “non-effects”? Well replication probably isn’t enough. If we believe that a study has generated ‘effects’ that we think are spurious, then failure to replicate might be instructive, but it doesn’t tell us how or why the original study came by a significant effect. Whether the cause of the false effect is statistical or procedural, it is still important to identify this cause and empirically verify that it was indeed causing the spurious findings. This can be illustrated by a series of replication studies we have recently carried out in our experimental psychopathology labs at the University of Sussex.
Recently we’ve been running some studies looking at the effects of procedures that generate distress on cognitive appraisal processes. These studies are quite simple in design and highly effective at generating negative mood and distress in our participants (participants are usually undergraduate students participating for course credits), and pilot studies suggest that experienced distress and negative mood do indeed facilitate the use of clinically-relevant appraisal processes.
The first study we did was piloted as a final year student project. It produced nice data that supported our predictions – except for one thing. The two groups (distress group and control group) differed significantly on pre-manipulation baseline measures of mood and other clinically-relevant characteristics. Participants due to undertake the most distressing manipulation scored significantly higher on pre-experimental clinical measures of anxiety (M=6.9, SD 3.6, v M=3.8, SD 2.5)[F(56)=4.01 , p=.05], and depression (M=2.2, SD 2.6, M=1.1, SD 1.1) [F(56)=4.24, p=.04]. Was this just bad luck? The project student had administered the questionnaires herself prior to the experimental manipulations, and she had used a quasi-random participant allocation method (rotating participants to experimental conditions in a fixed pattern).
Although our experimental predictions had been supported (even when pre-experimental baseline measures were controlled for), we decided to replicate the study, this time run by another final year project student. Lo and behold, the participants due to undertake the distressing task scored significantly higher on pre-experimental measures of anxiety (M=9.1, SD 4.1, v M=6.9, SD 3.0) [F(56)=6.01, p=.01], and depression (M=4.3, SD 3.7, v M=2.4, SD 2.4) [F(56)=5.09, p=.02]. Another case of bad luck? Questionnaires were administered and participants allocated in the same way as the first study.
Was this a case of enthusiastic final year project students determined to complete a successful project in some way conveying information to the participant about what they were to imminently undergo? Basically, was this an implicit experimenter demand effect being conveyed by an inexperienced experimenter? To try and clear this up, we decided to replicate again, this time it was to be run by an experienced post doc researcher – someone who was wise to the possibility of experimenter demand effects, aware that this procedure was possibly prone to these demand effects, and would presumably to be able to minimize them. To cut a long story short – we replicated the study again – but still replicated the pre-experimental group differences in mood measures! Participants who were about to undergo the distress procedure scored higher than participants about to undergo the unstressful control condition.
At this point, we were beginning to believe in pre-cognition effects! Finally, we decided to replicate again. But this time, the experimenter would be entirely blind to the experimental condition that a participant was in. Sixty sealed packs of questionnaires and instructions were made up before any participants were tested – half contained information for the participant about how to complete the questionnaires and how to run either the stressful or the control condition. The experimenter merely allowed the participant to chose a pack from a box at the outset, and was entirely unaware which condition the participant was running during the experiment. To cut another long story short – to our relief and satisfaction, the pre-experimental group differences in anxiety and depression measures disappeared. It wasn’t pre-cognition after all - it was an experimenter demand effect.
The point I’m making is that replication alone may not be sufficient to identify genuine effects – you can also replicate “non-effects” quite effectively - even by actively trying not to, and even more so by meticulously replicating the original procedure. If we have no faith in a particular experimental finding, it is incumbent on us as good scientists to identify the factor or factors that gave rise to that spurious finding wherever we can.