Taking Responsibility: How an answer to research assessment might just be “42”

(a guest post from Cameron Neylon, Curtin University) in response to the open citations experiment

In Douglas Adams’ Hitchhiker’s Guide to the Galaxy a massive computer is created to give the answer to the ultimate question, of life the universe and everything. After ten million years of computing it gives its answer:

ONE: Well?
DEEP THOUGHT: You’re really not going to like it.
TWO: Tell us!!!
DEEP THOUGHT: All right. The Answer to Everything…
TWO: Yes…!
DEEP THOUGHT: Life, The Universe and Everything…
ONE: Yes…!
THREE: Yes…!
ONE/TWO: Yes…!
DEEP THOUGHT: Forty two.

(Pause. Actually quite a long time)

Douglas Adams: The Hitchhiker’s Guide to the Galaxy: The original radio scripts

The joke of course is that you need to know the question to understand the answer. But in the context of research assessment the joke is sharper. How can an arbitrary number (a number you can imagine Adams’ worrying about making sufficiently banal, not a prime, not magic, just…42) capture the complexities of the value created by research. And yet is it an accident that the Journal Impact Factor of Nature falls just short of 42?

The HEFCE metrics report was a rare thing, a report that gained almost universal support for finding a clear and evidenced middle ground on the future of research assessment. More data is coming and ignoring it is foolhardy – the “Metric Tide” of the report’s title – but we should also engage on our own terms, with a critical, and scholarly, approach to what these new data can tell us.

The report developed the idea of “Responsible Indicators” that have a set of characteristics: robustness, humility, transparency, diversity, and reflexivity. Responsible indicators are ones that will stand up to the same kind of criticism as any research claim. That means that its not just the characteristics of the indicator that matter but also the process of its measurement. Does it address a well-founded question or decision? Are the data available for checking and critique? Is the aggregation and analysis of this data robust and appropriate?

Knoth and Herrmannova, in their experiment developing a new indicator have achieved something valuable. They have developed a numerical indicator that is broadly independent of the number of citations an article receives. There are many issues to raise with the indicator itself and what it can tell us but first and foremost it shows the potential of using the content of a research output itself for assessment.

The indicator they develop measures the “semantic distance” between the articles cited by a specific output and the article that in turn cite it. It seeks to use this distance as a measure of the contribution, in essence the distance that the output has contributed to the journey of knowledge creation. Their report illustrates the limitations of the available data, both in terms of full-text and the availability of citation information. It is a valuable contribution to the debate on what indicators can be based on.

There are technical issues to be raised, does the semantic analysis actually measure distance in meaning or only syntax? Is it sensitive to changes in language rather than substance? It worries me that the quantitative analysis gives a normal distribution. Simple chain processes, like the use and processing of information, most often give power law distributions. Normal distributions suggest many different processes all contributing to the final outcome.

However the biggest issue for me lies in the framing. The indicator is labelled “contribution” but we’re not sure what it really measures. There are two problems with indicator in its current form. The first is that we don’t have a sense of how it relates to expert judgement. Expert judgement is far from perfect, or even reliable, but without an understanding of the relationship we’re unlikely to see much adoption.

A related, and potentially larger, issue is that it is not clear what answer or problem is being solved. One of the big problems with our current metrics is that they have become the target as opposed to a (pretty poor) proxy of something that we actually care about. Just as in the Hitch Hiker’s Guide to the Galaxy, where the answer to the question of the meaning of life the universe and everything is 42, the result is meaningless without understanding the question. We can’t tell how useful or accurate the contribution indicator is without asking “for what”.

This new indicator is valuable because it illustrates what is possible. I disagree with aspects of its implementation but that’s almost a part of its value. Maybe it is just what you need to solve your particular problem. Rather than accept or reject an indicator in a vacuum, we need a toolbox of approaches that lets us ask a different question, is it useful in addressing this specific question?

42 might be a very useful answer, at least if you know how to ask the right question. But we need the tools to be able to tell. Towards the end of Adams’ radio play the suggestion is made that the answer and the question cannot co-exist in the same universe. For Adams this was a joke, but for a researcher this is our bread and butter. We can only ever refine our questions in response to our answers and our answers in response to our questions. We just need to apply our own standards to measuring ourselves.



About the author: Cameron Neylon is an advocate for open access and Professor of Research Communications at the Centre for Culture and Technology at Curtin University. You can find out more about his work and get in touch with Cameron via his personal page Science in the Open.