Friday, 2 March 2012

When measuring science distorts it: 8 things that muddy the waters of scientific integrity and progress

If you are a scientist of almost any persuasion, one of the processes that you probably cherish most dearly is the objectivity and integrity of the scientific process - a process that leads us to discover and communicate what we loosely like to call ‘the truth’ about our understanding of things. But maybe the process is not as honed as it should be, and maybe it’s not as efficient as it could be? In many cases it’s the desire to quantify and evaluate research output for purposes other than to understand scientific progress that is the culprit, and which distorts scientific progress to the point where it becomes an obstacle to good and efficient science. Below are 8 factors that lead to a distortion of the scientific process – many of which have been brought about by the desire to quantify and evaluate research. Scientific communities have discussed many of these factors previously on various social networks and in scientific blogs, but I thought it would be useful to bring some of them together.

1.         Does measurement of researchers’ scientific productivity harm science? Our current measures of scientific productivity are crude, but are now so universally adopted that they matter for all aspects of the researcher’s career, including tenure (or unemployment), funding (or none), success (or failure), and research time (or teaching load) (Lawrence, 2008)[1]. Research productivity is measured by number of publications, number of citations, and impact factors of journal outlets that are then rewarded with money (either in the form of salaries or grants). Lawrence argues that if you need to publish “because you need a meal ticket, then you end up publishing when you are hungry – not when the research work is satisfactorily completed”. As a result, work is regularly submitted for publication when it is incomplete, when the ideas are not fully thought through, or with incomplete data and arguments. Publication – not the quality of the scientific knowledge reported – is paramount.

2.         But the need to publish in high impact journals has another consequence. Journal impact factors are correlated with the number of retractions rather than citations an individual paper will receive (http://bit.ly/AbFfpz)[2]. One implication of this is that the rush to publish in high impact journals increases the pressure to ‘maybe’ “forget a control group/experiment, or leave out some data points that don’t make the story look so nice”? – all behaviours that will decrease the reliability of the scientific reports being published (http://bit.ly/ArMha6).


3.         The careerism that is generated by our research quality and productivity measures not only fosters incomplete science at the point of publication, it can also give rise to exaggeration and outright fraud (http://bit.ly/AsIO8B). There are recent prominent examples of well-known and ‘respected’ researchers faking data on an almost industrial scale. One recent example of extended and intentional fraud is the Dutch social psychologist Diederick Stapel, whose retraction was published in the journal Science (http://bit.ly/yH28gm)[3]. In this and possibly other cases, the rewards of publication and citation outweighed the risks of being caught. Are such cases of fraudulent research isolated examples or the tip of the iceberg? They may well be the tip of a rather large iceberg. More than 1 in 10 British-based scientists or doctors report witnessing colleagues intentionally altering or fabricating data during their research (http://reut.rs/ADsX59), and a survey of US academic psychologists suggests that 1 in 10 Psychologists in the US has falsified research data (http://bit.ly/yxSL1A)[4]. If these findings can be extrapolated generally, then we might expect that 1 in 10 of the scientific articles we read contains, or is based on, doctored or even faked data.


4.         Journal impact ratings have another negative consequence on the scientific process. There is an increasing tendency for journal editors to reject submissions without review – not purely on the basis of methodological or theoretical rigour – but on the basis that the research lacks “novelty or general interest” (http://bit.ly/wvp9V8). This tends to be editors attempting to protect the impact rating of their journal by rejecting submissions that might be technically and methodologically sound, but are unlikely to get cited very much. One particular type of research that falls foul of this process is likely to be replication. Replication is a cornerstone of the scientific method, yet failures to replicate appear to have a low priority for publication – even when the original study being replicated is controversial (http://bit.ly/AzyRXw). That citation rate has become the gold standard to indicate the quality of a piece of research or the standing of a particular researcher misses the point that high citation rates can also result from controversial but un-replicable findings. This has led some scientists to advocate the use of a ‘r’ or ‘replicability’ index for research to supplement the basic citation index (http://bit.ly/xQuuEP).

5.         Whether a research finding is published and considered to be methodologically sound is usually assessed by the use of standard statistical criteria (e.g. assessed by formal statistical significance, typically for p-values less than 0.05). But the probability that a research finding is true is not just dependent on the statistical power of the study and the level of statistical significance, but also on other factors to do with the context in which research on that topic is being undertaken. As John Ioannidis has pointed out, “..a research finding is less likely to be true when the studies conducted in a field are smaller; when effect sizes are smaller; when there is a greater number and lesser preselection of tested relationships; where there is greater flexibility in designs, definitions, outcomes, and analytical modes; when there is greater financial and other interest and prejudice; and when more teams are involved in a scientific field in chase of statistical significance.” (Ioannidis, 2005)[5]. This leads to the conclusion that most research findings are false for most research designs and for most fields!

6.         In order to accommodate the inevitable growth in scientific publication, journals have increasingly taken to publishing research in shorter formats than the traditional scientific article. These short reports limit the length of an article, but the need for this type of article may well be driven by the academic researcher’s need to publish in order to maintain their career rather than the publisher’s need to optimize limited publishing resources (e.g. pages in a printed journal edition). The advantage for researchers – and their need to publish and be cited – is that on a per page basis shorter articles are cited more frequently than longer articles (Haslam, 2010)[6]. But short reports can lead to the propagation of ‘bad’ or ‘false’ science. For example, shorter, single-study articles can be poor models of science because longer, multiple study articles may often include confirmatory full or partial replications of the main findings (http://nyti.ms/wkzBpS). In addition, small studies are inherently unreliable and more likely to generate false positive results (Bertamini & Munafo, 2012)[7]. Many national research assessment exercises require not only that quality of research be assessed in some way, but they also specify a minimum quantity requirement as well. Short reports – with all the disadvantages they may bring to scientific practice – will have a particular attraction to those researchers under pressure to produce quantity rather than quality.

7.         The desire to measure the applied “impact relevance” of research – especially in relation to research funding and national research assessment exercises has inherent dangers for identifying and understanding high-quality research. For example, in the forthcoming UK research excellence framework, lower quality research for which there is good evidence of “impact” may be given a higher value than higher-quality outputs for which an “impact” case is less easy to make (http://bit.ly/y7cqPW). This shift towards the importance of research “impact” in defining research quality has the danger of encouraging researchers to pursue research relevant to short-term policy agendas rather than longer-term theoretical issues. The associated funding consequence is that research money will drift towards those organizations pursuing policy-relevant rather than theory-relevant research, with the former being inherently labile and dependent on changes in both governments and government policies.

8.         Finally, when discussing whether funding is allocated in a way appropriate to optimizing scientific progress, there is the issue of whether we fund researchers when they’re past their best. Do we neglect those researchers in their productive prime who can add fresh zest and ideas into the scientific research process? Research productivity peaks at age 44 (or an average of 17 years after a researchers first publication), but research funding peaks at age 53 – suggesting productivity declines even as funding increases (http://bit.ly/yQUFis). It’s true, these are average statistics, but it would be interesting to know whether there are inherent factors in the funding process that favour past reputation over current productivity.




[1] Lawrence P A (2008) Lost in publication: How measurement harms science. Ethics in Science & Environmental Politics, 8, 9-11.
[2] Fang F C & Casadevall A (2011) Retracted science and the retraction index. Infection & Immunity. Doi: 10.1128/IAI.05661-11.
[3] Stapel D A & Lindenberg S (2011) Retraction. Science, 334, 1202.
[4] John K L, Loewenstein G & Prelec D (in press) Measuring the prevalence of questionable research practices with incentives for truth-telling. Psychological Science.
[5] Ioannidis J P A (2012) Why most published research findings are false. PLoS Medicine, doi/10.1371/journal.pmed.0020124
[6] Haslam N (2010) Bite-size science: Relative impact of short article formats. Perspectives on Psychological Science, 5, 263-264.
[7] Bertamini M & Munafo M R (2012) Bite-size science and its undesired side effects. Perspectives in Psychological Science, 7, 67-71.

No comments:

Post a Comment