Author: Yusuf Ozkan

Can we measure research in a more meaningful way? A quick look at the 2025 PLOS Open Science Indicators dataset

Why measuring open research matters

Open access publishing is sometimes mistakenly thought of as the be-all-and-end-all of open research (or open science). But while open access is important, it is not enough to capture the full breadth of research contributions, outputs and impacts. There are other indicators that can tell us more about the openness and quality of research than journal-related metrics do. 

The research community is adopting more open research practices. After all, research is about more than just publishing the final outcome. It involves the processes that produce it. Measuring the open access level of publications gives a narrow insight about research. It should ideally cover other pillars – such as sharing data and code, embracing preprints, and adopting transparent research protocols – for reliable and high-quality research.

Measuring the level of open research adoption could provide valuable insight into how far the community has progressed on the path toward openness. It would also allow us to advocate what practices researchers can implement to make their work more reliable. Perhaps, most importantly, open research indicators can offer a better alternative to unfruitful – even unhealthy – metrics that have created a research culture of perverse incentives, like Journal Impact Factor and h-index. Measuring open research is not easy due to the unsystematic adoption of practices and their application on publications, such as not including code used for research or a dataset made available on request. Unlike open access, these practices are not yet fully mature. Researchers may also overlook making their open research activities visible, even when they may already be engaging in them. Therefore, understanding and demonstrating the extent of adoption is important for raising awareness and fostering broader engagement. 

Measuring open research is hard, but PLOS proves it can be done

The academic publisher PLOS has been working on this challenging task of measuring open research principles. Partnering with DataSeer (an AI-based tool that checks manuscripts for identifiers of open practices), they have been releasing the comprehensive PLOS Open Science Indicators (OSI) dataset since late 2022 containing open research indicators, such as data availability/share, code availability/share, and preprint adoption.

The latest version of the OSI dataset has a total of 138,995 PLOS research articles published between 2018 and March 2025 (see here for their methodology). The 2025 OSI dataset expands on previous versions by including two new indicators:  protocol-sharing and study (pre)registration. Previous versions only had data availability/sharing, code availability/sharing and preprint posting. In this blog, I analyse publications of Imperial College London within the OSI dataset. It aims to 1) understand the level of open research adoption at Imperial 2) outline a simple methodology that other institutions can easily implement for their own analyses.

Methods to identify Imperial research outputs in the OSI dataset

A simple methodology of matching research outputs of Imperial College London within the OSI dataset through DOIs is used. By using the internal Power BI report which is connected to Imperial’s current research information system, ‘journal article’ type items with a DOI published between 2018-2025 were downloaded (DOI is preferred as it allows matching outputs with a unique identifier). 1,368 out of 94,384 outputs matched the PLOS OSI dataset, representing 1.45% of Imperial’s total journal articles and accounting for nearly 1% of the entire PLOS dataset. 

Figure 1: PLOS OSI dataset represents 1% of Imperial publications

Findings: open research practices at Imperial

97% of the Imperial articles have generated data. It is not surprising to see the high percentage of data-sharing (nearly 85% of articles that generated data) in Imperial outputs in the OSI dataset because of Imperial’s active data management policies and publisher/funder data availability requirements. Data sharing is nearly 9% higher among Imperial articles compared to the overall PLOS dataset.

Figure 2: Data sharing is the most adopted open research practice at Imperial (83% of journal articles published by Imperial authors from 2018 to 2025 included a link to shared data)

Accessing code which is the basis of research data generated within the output is vital for research integrity and reproducibility. Although there has been some improvement, codesharing isn’t as developed as data-sharing.

Nearly half of Imperial’s publications in the OSI dataset involved code generation.  However, among those that did use code, only 56% have shared it, meaning that readers cannot ask the right questions about the reliability of the code underpinning the analysis in almost half the dataset.

Figure 3: The practice of sharing code in research outputs is increasing (2025 – incomplete year)

Posting preprints (non-peer-reviewed manuscripts posted to online servers before journal submission) has become a norm in many disciplines. Preprints improve research quality and increase visibility by enabling rapid dissemination and collaborative and timely feedback. It also a valuable indicator of bridge-building between research stages. Measuring how many publications originate from preprints can provide valuable insight into the adoption of open research practices. However, linking a preprint to its final published version can be challenging. The OSI dataset’s methodology promises to overcome this challenge.

35% of Imperial outputs in the PLOS dataset were associated with a preprint. Except for the incomplete year of 2025, preprint adoption has increased over recent years, jumping from 13% in 2018 to nearly 50% in 2024. The preprint match for Imperial publications is twice the global average in the PLOS set.

Figure 4: Preprint match of Imperial publications is higher than the global average

The latest version of the OSI dataset includes two new open research indicators: protocol and study registration. Although these two concepts have been incorporated in research process for some time, publishing and sharing them in outputs is relatively new. Implementing and sharing them is just as important as sharing data, as both are crucial for enhancing transparency and reproducibility, and ultimately for building trustworthiness in research.

Compared to other open research practices, the adoption of protocol sharing and study registration remains low. Only 10% of Imperial articles in the dataset included an available protocol and study registration, similar to the global average. The results here should be interpreted with caution since protocols are more common in certain fields than others.

Figure 5: Protocol sharing and study registration rank lowest among open research practices

Do open research practices reinforce one another? The answer is yes.

Given the comprehensiveness of the PLOS dataset, it would be interesting to look at the relationship between the open research indicators. When considering the outputs that shared data, the percentage of all other open research indicators increases (e.g. code sharing rises from 25% to 32%, preprint match from 35% to 38%). The change becomes even more noticeable when focusing on publications that shared code. Among the Imperial articles that shared code, data-sharing jumped from 83% to 98% and preprint matching increased from 35% to 59%. The same pattern is true for preprints: outputs matching a preprint are significantly superior in adopting other open research indicators, especially data and code sharing.

Figure 6: Outputs linked to preprints show higher adoption of other open research practices

Conclusion with reminding limitations and caveats

While the number of publications is significant, it only represents a tiny amount of the publication universe. The dataset includes open access outputs from one publisher that predominantly publishes in certain disciplines. Some fields, for instance, do not produce data as we understand it in the context of science. Also, preprinting may not be common where the journal article is not the main output type in some disciplines. It is also important to highlight that PLOS dataset puts journal articles in the centre and works backwards, meaning that we cannot confidently say whether open research practices were followed from the beginning of the research process or just at the time of publication. Generalising these conclusions to the entire research ecosystem without keeping the nuances in mind would be invalid and would risk creating gamification and perverse incentives in the same way that journal-based metrics have done for years. Despite its caveats, the PLOS Open Science Indicators dataset is a great starting point for measuring research in a more meaningful way, especially in the presence of unhealthy metrics. The analysis of the dataset gives us good insights into where we should focus our open research advocacy.

This brief analysis shows that while we seem to be in a strong position when it comes to sharing data, there is still room for improvement in other areas, such as sharing code, releasing preprints, making research protocols available, and registering studies before they begin, especially if they are underpinning elements of research. Everyone in this ecosystem, from researchers and institutions to funders and publishers, has a role to play in making this happen.

Are Imperial publications gaining attention on Bluesky?

This post is authored by Yusuf Ozkan, Research Outputs Analyst, and Dr Hamid Khan, Open Research Manager: Academic Engagement.

Researchers increasingly use social media to communicate their research. They share links to journal articles, but also other types of output like preprints, graphics/figures and lay summaries.  

That enables us to measure alternative indicators of research visibility beyond citations of, and in, journal articles. With many services like X, Mastodon, Threads and LinkedIn, researchers and the public are scattered across the social media world, which makes tracking visibility difficult. Bluesky has joined the club recently and is growing rapidly. In this post, we highlight how research-related conversations and citations of Imperial outputs have increased on Bluesky, emphasising the value of using the Library’s tools to track citations on social media.  

Although Bluesky is a relatively new platform – launched in 2023 as an invitation-only service – it has reached nearly 30 million users at the time of writing. The number of users increased by seven million in just six weeks from November 2024.  

Many people have migrated from X (formerly Twitter) to Bluesky during this period, partly following the US election, but the reasons for migration are not limited to politics. Bluesky also surpassed Threads in website user numbers. The rapid increase in users and growing trend of researchers joining Bluesky is making it an increasingly convenient forum for research conversations. 

Given the increase in users, we would expect to see research outputs being shared more widely on Bluesky. But it is extremely difficult, if not impossible, to manually measure that. This is where Altmetric comes into play, to track mentions of outputs.  

Altmetric is a tool for providing data about online attention to research by identifying mentions of research outputs on social media, blog sites, Wikipedia, news outlets and more. Altmetric donuts and badges display an attention score summarising all the online engagement with a scholarly publication. Altmetric can be useful to show societal visibility and impact, though its limitations should also be kept in mind. Imperial Library has a subscription to Altmetric. We can use Altmetric to see how social media users interact with Imperial’s research outputs. It’s one of many tools we use to support researchers to move away from journal-based metrics for evaluating the reach of their work. 

The migration of researchers away from X prompted Altmetric to start to monitor emerging platforms, leading to the inclusion of Bluesky in Altmetric statistics in December 2024, although the platform had been picking up citations on Bluesky since October.  

There are nearly 400 thousand Bluesky posts citing a research paper from late October to mid-January – less than just three months, which is a significant milestone considering it took Twitter nine years to reach 300 thousand posts linking a research paper. 

Altmetric picked up a dramatic global increase in mentions of research outputs on Bluesky from November 2024
Altmetric picked up a dramatic global increase in mentions of research outputs on Bluesky from November 2024

Bluesky is a rising star for research conversations online, but what is the situation when it comes to mentions of Imperial research outputs? Well, the trend is no different from the overall picture.

Altmetric identifies over 11,000 mentions of publications on Bluesky associated with Imperial authors from November 2024 to January 2025. The number of Imperial output mentions on X is four times higher than Bluesky for the same period. Given that Bluesky has been launched recently and has ten times fewer users than X, the figure is still substantial.

Bluesky is the second-most-referenced source type after X for research outputs tracked by Altmetric, November 2024 – January 2025.
Bluesky is the second-most-referenced source type after X for research outputs tracked by Altmetric, November 2024 – January 2025.

The mentions of Imperial publications on Bluesky followed a similar trend to the overall mentions of research outputs on the platform. There was a massive uptick in mid-November 2024, taking the number from a few mentions to thousands per week. Although the number of mentions appears to be coming down, the increasing number of overall Bluesky users and posts suggests citations are not likely to return to their pre-November level.

Massive uptick in mentions of Imperial publications on Bluesky from mid-November 2024
Massive uptick in mentions of Imperial publications on Bluesky from mid-November 2024

Comparing mentions on Bluesky with X for all time gives us another perspective on how sharing practices have changed. The number of X mentions for Imperial outputs has consistently decreased since 2021 from 620K mentions to 270K in 2024. If this trend continues, we expect to see just over 100K mentions in 2025.

Mentions of Imperial research outputs on X peaked in 2021 and have plummeted ever since
Mentions of Imperial research outputs on X peaked in 2021 and have plummeted ever since

Even though Bluesky is just two years old and Altmetric have been including mentions on the platform for three months, the volume of mentions is impressive.

Bluesky is a new social media platform whose users are increasing. The volume of research-related conversation on Bluesky has increased since October 2024, making it the second-largest data source tracked by Altmetric over the past three months. Imperial research outputs are widely shared on the platform, too, with the citation of over 10K for the same period. But there is a note of caution.

Social media is great for increasing visibility and reach. It can be a good way to encourage open and collaborative peer review, and ultimately help improve quality and impact. However, metrics provided by platforms like Altmetric can be misleading, as they don’t track everything happening on the internet. For example, Altmetric only includes historical data for LinkedIn. Current mentions are not tracked despite the presence of many researchers on LinkedIn.

Social media platforms have some biases, such as vulnerability to manipulation and gaming (just like the Journal Impact Factor), imbalanced user demographics, and either over- or under-representation of an academic discipline on one platform. Counting citations is a risky business, because social media mentions do not necessarily point to positive impact or high quality. Someone could be critiquing or rebutting your work in citing it. Despite limitations, diverse platforms for sharing research are good for discoverability, since one user of a platform may not use another. This increases the potential impact of research by reaching diverse audiences. Bluesky is a recent and promising example, demonstrating how emerging platforms can broaden the reach and visibility of research publications.

To see how your research is being seen and cited on social media, you can make use of the Library’s subscription to Altmetric. Get in touch with the Bibliometrics service to discuss ways to measure the visibility and impact of your work other than the flawed Journal Impact Factor.

Note: This post was authored in mid-January. Therefore, some of the figures might have changed by the time of publication.