If you are a
scientist of almost any persuasion, one of the processes that you probably
cherish most dearly is the objectivity and integrity of the scientific process
- a process that leads us to discover and communicate what we loosely like to
call ‘the truth’ about our understanding of things. But maybe the process is
not as honed as it should be, and maybe it’s not as efficient as it could be?
In many cases it’s the desire to quantify and evaluate research output for
purposes other than to understand scientific progress that is the culprit, and
which distorts scientific progress to the point where it becomes an obstacle to
good and efficient science. Below are 8 factors that lead to a distortion of
the scientific process – many of which have been brought about by the desire to
quantify and evaluate research. Scientific communities have discussed many of
these factors previously on various social networks and in scientific blogs,
but I thought it would be useful to bring some of them together.
1. Does measurement of researchers’
scientific productivity harm science? Our current measures of scientific
productivity are crude, but are now so universally adopted that they matter for
all aspects of the researcher’s career, including tenure (or unemployment),
funding (or none), success (or failure), and research time (or teaching load)
(Lawrence, 2008)[1].
Research productivity is measured by number of publications, number of
citations, and impact factors of journal outlets that are then rewarded with
money (either in the form of salaries or grants). Lawrence argues that if you
need to publish “because you need a meal ticket, then you end up publishing
when you are hungry – not when the research work is satisfactorily completed”.
As a result, work is regularly submitted for publication when it is incomplete,
when the ideas are not fully thought through, or with incomplete data and
arguments. Publication – not the quality of the scientific knowledge reported –
is paramount.
2. But the need to publish in high impact
journals has another consequence. Journal impact factors are correlated with
the number of retractions rather than citations an individual paper will receive
(http://bit.ly/AbFfpz)[2].
One implication of this is that the rush to publish in high impact journals
increases the pressure to ‘maybe’ “forget a control group/experiment, or leave
out some data points that don’t make the story look so nice”? – all behaviours
that will decrease the reliability of the scientific reports being published (http://bit.ly/ArMha6).
3. The careerism that is generated by our
research quality and productivity measures not only fosters incomplete science
at the point of publication, it can also give rise to exaggeration and outright
fraud (http://bit.ly/AsIO8B). There are
recent prominent examples of well-known and ‘respected’ researchers faking data
on an almost industrial scale. One recent example of extended and intentional
fraud is the Dutch social psychologist Diederick Stapel, whose retraction was
published in the journal Science (http://bit.ly/yH28gm)[3].
In this and possibly other cases, the rewards of publication and citation
outweighed the risks of being caught. Are such cases of fraudulent research
isolated examples or the tip of the iceberg? They may well be the tip of a
rather large iceberg. More than 1 in 10 British-based scientists or doctors
report witnessing colleagues intentionally altering or fabricating data during
their research (http://reut.rs/ADsX59), and
a survey of US academic psychologists suggests that 1 in 10 Psychologists in
the US has falsified research data (http://bit.ly/yxSL1A)[4].
If these findings can be extrapolated generally, then we might expect that 1 in
10 of the scientific articles we read contains, or is based on, doctored or
even faked data.
4. Journal impact ratings have another
negative consequence on the scientific process. There is an increasing tendency
for journal editors to reject submissions without review – not purely on the
basis of methodological or theoretical rigour – but on the basis that the
research lacks “novelty or general interest” (http://bit.ly/wvp9V8).
This tends to be editors attempting to protect the impact rating of their
journal by rejecting submissions that might be technically and methodologically
sound, but are unlikely to get cited very much. One particular type of research
that falls foul of this process is likely to be replication. Replication is a
cornerstone of the scientific method, yet failures to replicate appear to have
a low priority for publication – even when the original study being replicated
is controversial (http://bit.ly/AzyRXw).
That citation rate has become the gold standard to indicate the quality of a
piece of research or the standing of a particular researcher misses the point
that high citation rates can also result from controversial but un-replicable
findings. This has led some scientists to advocate the use of a ‘r’ or
‘replicability’ index for research to supplement the basic citation index (http://bit.ly/xQuuEP).
5. Whether a research finding is published
and considered to be methodologically sound is usually assessed by the use of
standard statistical criteria (e.g. assessed by formal statistical
significance, typically for p-values
less than 0.05). But the probability that a research finding is true is not
just dependent on the statistical power of the study and the level of
statistical significance, but also on other factors to do with the context in
which research on that topic is being undertaken. As John Ioannidis has pointed
out, “..a research finding is less likely to be true when the studies conducted
in a field are smaller; when effect sizes are smaller; when there is a greater
number and lesser preselection of tested relationships; where there is greater
flexibility in designs, definitions, outcomes, and analytical modes; when there
is greater financial and other interest and prejudice; and when more teams are
involved in a scientific field in chase of statistical significance.”
(Ioannidis, 2005)[5].
This leads to the conclusion that most research findings are false for most
research designs and for most fields!
6. In order to accommodate the inevitable
growth in scientific publication, journals have increasingly taken to
publishing research in shorter formats than the traditional scientific article.
These short reports limit the length of an article, but the need for this type
of article may well be driven by the academic researcher’s need to publish in
order to maintain their career rather than the publisher’s need to optimize
limited publishing resources (e.g. pages in a printed journal edition). The
advantage for researchers – and their need to publish and be cited – is that on
a per page basis shorter articles are cited more frequently than longer
articles (Haslam, 2010)[6].
But short reports can lead to the propagation of ‘bad’ or ‘false’ science. For
example, shorter, single-study articles can be poor models of science because longer,
multiple study articles may often include confirmatory full or partial
replications of the main findings (http://nyti.ms/wkzBpS).
In addition, small studies are inherently unreliable and more likely to generate
false positive results (Bertamini & Munafo, 2012)[7].
Many national research assessment exercises require not only that quality of research be assessed in some
way, but they also specify a minimum quantity
requirement as well. Short reports – with all the disadvantages they may bring
to scientific practice – will have a particular attraction to those researchers
under pressure to produce quantity rather than quality.
7. The desire to measure the applied
“impact relevance” of research – especially in relation to research funding and
national research assessment exercises has inherent dangers for identifying and
understanding high-quality research. For example, in the forthcoming UK research
excellence framework, lower quality research for which there is good evidence
of “impact” may be given a higher value than higher-quality outputs for which
an “impact” case is less easy to make (http://bit.ly/y7cqPW).
This shift towards the importance of research “impact” in defining research
quality has the danger of encouraging researchers to pursue research relevant
to short-term policy agendas rather than longer-term theoretical issues. The
associated funding consequence is that research money will drift towards those
organizations pursuing policy-relevant rather than theory-relevant research,
with the former being inherently labile and dependent on changes in both
governments and government policies.
8. Finally, when discussing whether
funding is allocated in a way appropriate to optimizing scientific progress,
there is the issue of whether we fund researchers when they’re past their best.
Do we neglect those researchers in their productive prime who can add fresh
zest and ideas into the scientific research process? Research productivity
peaks at age 44 (or an average of 17 years after a researchers first
publication), but research funding peaks at age 53 – suggesting productivity
declines even as funding increases (http://bit.ly/yQUFis).
It’s true, these are average statistics, but it would be interesting to know
whether there are inherent factors in the funding process that favour past
reputation over current productivity.
[1] Lawrence P A (2008) Lost in publication: How measurement harms
science. Ethics in Science & Environmental Politics, 8, 9-11.
[2] Fang F C & Casadevall A (2011) Retracted science and the
retraction index. Infection & Immunity. Doi: 10.1128/IAI.05661-11.
[3] Stapel D A & Lindenberg S (2011) Retraction. Science, 334,
1202.
[4] John K L, Loewenstein G & Prelec D (in press) Measuring the
prevalence of questionable research practices with incentives for
truth-telling. Psychological Science.
[5] Ioannidis J P A (2012) Why most published research findings are
false. PLoS Medicine, doi/10.1371/journal.pmed.0020124
[6] Haslam N (2010) Bite-size science: Relative impact of short article
formats. Perspectives on Psychological Science, 5, 263-264.
[7] Bertamini M & Munafo M R (2012) Bite-size science and its
undesired side effects. Perspectives in Psychological Science, 7, 67-71.
No comments:
Post a Comment