citation experiment

Towards full-text based research metrics: Exploring semantometrics

See also:
Report of Experiments (PDF)
Open Citations and Responsible Metrics (PDF) – Briefing note by Cameron Neylon, Curtin University
Cameron Neylon’s comment on the experiment

The HEFCE Metrics Tide Report has started a debate and proposed directions of travel for more responsible metrics. Our intention for what we called the ‘open citation experiment’ was to support this debate, to investigate open data sources that indicators can be based on and help to inform new ways forward.

Currently, the most widely used indicators for research assessment are derived from proprietary data sources and focused on citations of the peer reviewed literature.

In many cases this data is not transparent, community governed or auditable. This comes with a number of risks for universities and funders: it encourages costly purchase of near monopoly products, results in unaccountable / un-auditable allocation of public funds and means that benchmarks are often not seen as legitimate and therefore not useful.

Image 'measurement' by Freddy Fam (CC BY-NC-ND 2.0)
Image by Freddy Fam, CC BY-NC-ND 2.0 (

The Experiments

Petr Knoth and Dasha Herrmanova from the Open University have experimented with a new approach to research assessment metrics (semantometrics) which isn’t based on citation data alone but argues that the full text is needed to assess the value of a research article. I should also add that we found this approach very promising as it makes use of a new data source – the increasing availability of open access full text options in the publication ecosystem.

The experiment was an attempt to create the first semantometric measure based on the idea of measuring an article’s contribution to the progress of scholarly discussion. At a very simplified level you could say that the indicator looks at the subject matter of the papers citing and being cited by a given paper. If the subject matter of those citing a paper differs greatly from those that are cited, then this is considered to have a greater semantic distance. This is considered desirable, as the theory goes that this will have had a greater contribution to research as it is making a greater leap between previous and new discoveries, and hence give a higher score using this metric.

To explain what this means in more detail and how the contribution measure is calculated Petr and Dasha have developed a demonstrator page:

The experiments report provides a correlation analysis of the contribution measure with two known metrics – citation counts and Mendeley readership – and analyses the behaviour of the contribution measure in relation to these metrics. However, rather than looking for a single new metric that could complement or replace citation counting, the aim of this experiment was to present an argument for studying this area more widely, to encourage developing new semantometric methods and to demonstrate this is already possible with openly available data.

To perform this analysis, both textual data of the research papers and citation data were needed. As no such dataset existed, the experiments have been conducted on a dataset obtained by merging data from CORE, Microsoft Academic Graph and Mendeley.

The report also looks at how article-level metrics can be extended to higher-level metrics, suggesting a new, fairer approach. While more work is needed to validate the proposed approach, the report emphasizes the need to move away from ad-hoc higher-level metrics (such as the h-index) to metrics that demonstrably fulfill certain objective criteria and show good performance on real data.

To find out more about the experiment and the results see the report of experiments:  Towards full-text based research metrics: Exploring semantometrics (PDF)

Open Citation Workshop March 2016

In a workshop at the end of March 2016, we reviewed the outcomes of the experiment with a number of sector representatives and also discussed potential next steps for research assessment metrics based on an open approach more generally.

We concluded that the argument for the usefulness of full text and openness in performing research assessment is convincing but that the work into semantometrics would need to go much further. We would need to investigate, for example, how the contribution measure compares to expert judgement. This would help us to see if the proposed indicator reflects any desired characteristics of research articles.

There were also plenty of suggestions for how to raise the profile of responsibly generated and applied indicators, how to improve and build on existing open data sources for research assessment metrics and how to make new ones available.  We’re working on a more detailed plan for next steps and will share this in due course.

In the meanwhile, we invited Cameron Neylon to comment on the experiment which you can read on his blog post – Taking Responsibility: How an answer to research assessment might just be “42”.

OR2016, 13-16 June, Dublin

Petr Knoth and Dasha Herrmannova presented at the OR2016 Conference on open research evaluation metrics:

Oxford vs Cambridge Contest: Collecting Open Research Evaluation Metrics for University Ranking

We are also very interested in your views about the experiment and any thoughts you may have about open metrics and indicators for research, so please do comment below.


citation experiment

Taking Responsibility: How an answer to research assessment might just be “42”

(a guest post from Cameron Neylon, Curtin University) in response to the open citations experiment

In Douglas Adams’ Hitchhiker’s Guide to the Galaxy a massive computer is created to give the answer to the ultimate question, of life the universe and everything. After ten million years of computing it gives its answer:

ONE: Well?
DEEP THOUGHT: You’re really not going to like it.
TWO: Tell us!!!
DEEP THOUGHT: All right. The Answer to Everything…
TWO: Yes…!
DEEP THOUGHT: Life, The Universe and Everything…
ONE: Yes…!
THREE: Yes…!
ONE/TWO: Yes…!
DEEP THOUGHT: Forty two.

(Pause. Actually quite a long time)

Douglas Adams: The Hitchhiker’s Guide to the Galaxy: The original radio scripts

The joke of course is that you need to know the question to understand the answer. But in the context of research assessment the joke is sharper. How can an arbitrary number (a number you can imagine Adams’ worrying about making sufficiently banal, not a prime, not magic, just…42) capture the complexities of the value created by research. And yet is it an accident that the Journal Impact Factor of Nature falls just short of 42?

The HEFCE metrics report was a rare thing, a report that gained almost universal support for finding a clear and evidenced middle ground on the future of research assessment. More data is coming and ignoring it is foolhardy – the “Metric Tide” of the report’s title – but we should also engage on our own terms, with a critical, and scholarly, approach to what these new data can tell us.

The report developed the idea of “Responsible Indicators” that have a set of characteristics: robustness, humility, transparency, diversity, and reflexivity. Responsible indicators are ones that will stand up to the same kind of criticism as any research claim. That means that its not just the characteristics of the indicator that matter but also the process of its measurement. Does it address a well-founded question or decision? Are the data available for checking and critique? Is the aggregation and analysis of this data robust and appropriate?

Knoth and Herrmannova, in their experiment developing a new indicator have achieved something valuable. They have developed a numerical indicator that is broadly independent of the number of citations an article receives. There are many issues to raise with the indicator itself and what it can tell us but first and foremost it shows the potential of using the content of a research output itself for assessment.

The indicator they develop measures the “semantic distance” between the articles cited by a specific output and the article that in turn cite it. It seeks to use this distance as a measure of the contribution, in essence the distance that the output has contributed to the journey of knowledge creation. Their report illustrates the limitations of the available data, both in terms of full-text and the availability of citation information. It is a valuable contribution to the debate on what indicators can be based on.

There are technical issues to be raised, does the semantic analysis actually measure distance in meaning or only syntax? Is it sensitive to changes in language rather than substance? It worries me that the quantitative analysis gives a normal distribution. Simple chain processes, like the use and processing of information, most often give power law distributions. Normal distributions suggest many different processes all contributing to the final outcome.

However the biggest issue for me lies in the framing. The indicator is labelled “contribution” but we’re not sure what it really measures. There are two problems with indicator in its current form. The first is that we don’t have a sense of how it relates to expert judgement. Expert judgement is far from perfect, or even reliable, but without an understanding of the relationship we’re unlikely to see much adoption.

A related, and potentially larger, issue is that it is not clear what answer or problem is being solved. One of the big problems with our current metrics is that they have become the target as opposed to a (pretty poor) proxy of something that we actually care about. Just as in the Hitch Hiker’s Guide to the Galaxy, where the answer to the question of the meaning of life the universe and everything is 42, the result is meaningless without understanding the question. We can’t tell how useful or accurate the contribution indicator is without asking “for what”.

This new indicator is valuable because it illustrates what is possible. I disagree with aspects of its implementation but that’s almost a part of its value. Maybe it is just what you need to solve your particular problem. Rather than accept or reject an indicator in a vacuum, we need a toolbox of approaches that lets us ask a different question, is it useful in addressing this specific question?

42 might be a very useful answer, at least if you know how to ask the right question. But we need the tools to be able to tell. Towards the end of Adams’ radio play the suggestion is made that the answer and the question cannot co-exist in the same universe. For Adams this was a joke, but for a researcher this is our bread and butter. We can only ever refine our questions in response to our answers and our answers in response to our questions. We just need to apply our own standards to measuring ourselves.


About the author: Cameron Neylon is an advocate for open access and Professor of Research Communications at the Centre for Culture and Technology at Curtin University. You can find out more about his work and get in touch with Cameron via his personal page Science in the Open.