There has been
much talk recently about the scientific process in the light of recent claims
of fraud against a number of psychologists (http://bit.ly/R8ruMg),
and also the failure of researchers to replicate some controversial findings by
Darryl Bem purportedly showing effects reminiscent of pre-cognition (http://bit.ly/xVmmOv). This has led to calls
for replication to be the cornerstone of good science – basically “an effect is
not an effect until it’s replicated” (http://bit.ly/UtE1hb).
But is replication enough? Is it possible to still replicate “non-effects”?
Well replication probably isn’t enough. If we believe that a study has
generated ‘effects’ that we think are spurious, then failure to replicate might
be instructive, but it doesn’t tell us how or why the original study came by a
significant effect. Whether the cause of the false effect is statistical or
procedural, it is still important to identify this cause and empirically verify
that it was indeed causing the spurious findings. This can be illustrated by a
series of replication studies we have recently carried out in our experimental
psychopathology labs at the University of Sussex.
Recently we’ve
been running some studies looking at the effects of procedures that generate
distress on cognitive appraisal processes. These studies are quite simple in
design and highly effective at generating negative mood and distress in our
participants (participants are usually undergraduate students participating for
course credits), and pilot studies suggest that experienced distress and
negative mood do indeed facilitate the use of clinically-relevant appraisal
processes.
The first study we
did was piloted as a final year student project. It produced nice data that
supported our predictions – except for one thing. The two groups (distress
group and control group) differed significantly on pre-manipulation baseline
measures of mood and other clinically-relevant characteristics. Participants
due to undertake the most distressing manipulation scored significantly higher
on pre-experimental clinical measures of anxiety (M=6.9, SD 3.6, v M=3.8, SD
2.5)[F(56)=4.01 , p=.05], and depression (M=2.2, SD 2.6,
M=1.1, SD 1.1) [F(56)=4.24, p=.04]. Was this just bad luck? The
project student had administered the questionnaires herself prior to the
experimental manipulations, and she had used a quasi-random participant
allocation method (rotating participants to experimental conditions in a fixed
pattern).
Although our
experimental predictions had been supported (even when pre-experimental
baseline measures were controlled for), we decided to replicate the study, this
time run by another final year project student. Lo and behold, the participants
due to undertake the distressing task scored significantly higher on pre-experimental
measures of anxiety (M=9.1, SD 4.1, v M=6.9, SD 3.0) [F(56)=6.01, p=.01], and
depression (M=4.3, SD 3.7, v M=2.4, SD 2.4) [F(56)=5.09, p=.02]. Another case
of bad luck? Questionnaires were administered and participants allocated in the
same way as the first study.
Was this a case of
enthusiastic final year project students determined to complete a successful
project in some way conveying information to the participant about what they
were to imminently undergo? Basically, was this an implicit experimenter demand
effect being conveyed by an inexperienced experimenter? To try and clear this
up, we decided to replicate again, this time it was to be run by an experienced
post doc researcher – someone who was wise to the possibility of experimenter
demand effects, aware that this procedure was possibly prone to these demand
effects, and would presumably to be able to minimize them. To cut a long story short – we replicated the
study again – but still replicated the pre-experimental group differences in mood
measures! Participants who were about to undergo the distress procedure scored
higher than participants about to undergo the unstressful control condition.
At this point, we
were beginning to believe in pre-cognition effects! Finally, we decided to
replicate again. But this time, the experimenter would be entirely blind to the
experimental condition that a participant was in. Sixty sealed packs of
questionnaires and instructions were made up before any participants were
tested – half contained information for the participant about how to complete
the questionnaires and how to run either the stressful or the control
condition. The experimenter merely allowed the participant to chose a pack from
a box at the outset, and was entirely unaware which condition the participant
was running during the experiment. To cut another long story short – to our
relief and satisfaction, the pre-experimental group differences in anxiety and
depression measures disappeared. It wasn’t pre-cognition after all - it was an
experimenter demand effect.
The point I’m
making is that replication alone may not be sufficient to identify genuine
effects – you can also replicate “non-effects” quite effectively - even by
actively trying not to, and even more so by meticulously replicating the
original procedure. If we have no faith in a particular experimental finding,
it is incumbent on us as good scientists to identify the factor or factors that
gave rise to that spurious finding wherever we can.
Follow me on Twitter at:
No comments:
Post a Comment