David Phillips, Author at Open Access and Digital Scholarship Blog

Vanishing Acts: US research data at risk (and how to save them!)

25 June 2025

This post was written by Camille Regnault, Senior Scholarly Communications Assistant in the Research Data Management team.

Overview:

Earlier this year, several reports raised concerns about National Institutes of Health (NIH)-funded data repositories—critical to research and public health—being flagged for potential ‘review,’ prompting widespread unease within the scientific community.

These reports have emerged against a backdrop of sweeping grant terminations and cuts to research considered to be related to DEI programs, following Presidential Executive Orders in the US and a regime change under the NIH.

The NIH is the largest funder of biomedical research in the world and one of eleven divisions that fall under the US Department of Health and Human Services, making important discoveries that improve health and save lives. It is a key provider of resources such as PubMed and PubMed Central (PMC), for medical and health research as well as ClinicalTrials.gov, for clinical trial data.

The NIH is also Imperials’ most commonly applied to US Federal Funding agency and the university holds many awards both as the Lead and as a Partner.

In the wake of these major shakeups however, it has additionally become clear that access to a variety of federally-created and federally-hosted datasets has been limited or removed while access to others remains potentially at risk.

Data rescue efforts and stakeholder responses:

In February 2025, the Data Rescue project emerged as the result of a coordinated effort between three data organisations, including members of IASSIST, RDAP, and the Data Curation Network, to attempt to safeguard threatened research data.

Their stated goal is ‘to serve as a clearinghouse for data rescue-related efforts and data access points for public US governmental data that are currently at risk’ and their efforts include ‘data gathering, data curation and cleaning, data cataloging, and providing sustained access and distribution of data assets’. You can read about their current efforts, here.

In the same month, the Harvard Law School Library Innovation Lab Team released the data.gov archive on Source Cooperative. The 16TB collection is available to access at https://source.coop/repositories/harvard-lil/gov-data/description and includes over 311,000 datasets harvested during 2024 and 2025. This is being updated on a daily basis as new datasets are added to data.gov.

In the UK members of the Chartered Institute of Library and Information Professionals (CILIP) Special Interest Groups, covering health and higher education, have identified concerning examples of removal and reduction of content relating to their work in public health, research, education, and science. Their recent statement invites members and the wider information profession community to share examples of how content, reports, datasets, evidence, and tools are being removed by US authorities.

What you can do:

If you’re an Imperial researcher or staff member affected by these changes, we strongly encourage you to reach out to the Research Data Management team. Your insights could help shape our response and contribute to broader sector-wide documentation efforts.

Stay informed by subscribing to the Data Rescue Project’s newsletter, which provides regular updates on the initiative’s progress. You can also explore opportunities to get involved and support their mission to safeguard vulnerable datasets here.

Additionally, if you’re a CILIP member or an information professional working within the UK sector, you can help assess the impact by completing a short form. Share examples of affected content, datasets, tools, or evidence to aid in understanding the scale and implications of these changes across the UK.

#LoveData24: Interview with Yves-Alexandre from the Computational Privacy Group

David Phillips

16 February 2024

As part of #LoveData24, the Research Data Management team had a chance to catch up with Yves-Alexandre, Associate Professor of Applied Mathematics and Computer Science at Imperial College London, who also heads the Computational Privacy Group (CPG). The CPG are a young research group at Imperial College London studying the privacy risks arising from large scale behavioural datasets. In this short interview we discussed the interests of the group, the challenges of managing sensitive research data and whether we need to reevaluate what we think we know about anonymisation.

How did you become involved with the Computational Privacy Group (CPG)?

Yves-Alexandre de Montjoye (YD)
So, my career as a researcher started when I was actually doing my master’s thesis at the Santa Fe Institute in New Mexico. That was in in 2009. This was pre-A.I. and the beginning of the Big Data era. People were extremely excited about the potential for working with large amounts of data to revolutionise the sciences, ranging from social science to psychology, to urban analytics or urban studies and medicine.

So many things suddenly became possible and people were like, ”this is the microscope”, or any other kind of analogy you can think of in terms of this being a true revolution for the scientific process. Some even went as far as saying, “this is the end of theory, right?” or “this spells the end of hypothesis testing. The data are going to basically speak for themselves”. There was a huge hype of expectation which, as time went on gradually decreased and eventually plateaued to what it is now. It did have a transformative impact on the sciences but to me it became quite obvious working with these data as a student, just how reidentifiable all these types of data potentially were.

Back in the days, we were looking at location data across the country and on the one hand, everyone was talking about how the data were anonymous. As a student, I was working with the data and I could see people moving around on the map, so to speak. And it just blew my mind. It didn’t seem like it would take very much for these data to not be anonymous anymore.

Anonymisation and the way we’ve been using it to protect data have been well documented in the literature. There has also been extensive research on how to properly anonymise data. I think what has taken a bit of time for people to grasp is that anonymisation, in the context of big data, is its own new, different question, and that actually a lot of the techniques that had developed from around 1990 to 2010 were basically not applicable to the world of big data anymore.
This is mostly due to two factors. The first one is just the sheer amount of data that is being collected about every single person in any given dataset that we are interested in, from social science to medicine.
Combining these with social media and the availability of auxiliary data (meaning data from an external source, such as census data) means that not only are there a lot of data about you in those datasets, but there are also a lot of data about you that can be cross-referenced with sources elsewhere to reidentify you. And I think what took us quite a bit of time to get across to people was that this was a novel and unique issue that had to be addressed. It’s really about big data and the availability of auxiliary data. I think that’s really what led a lot of our research into privacy. Regarding anonymisation, we are interested in the conversation around whether there is still a way to make it work as intended given everything we know or do we need to invent something fundamentally new. If that is the case, what should our contribution to a new method look like?

At the end of the day, I think the main message that we have is that anonymisation is a powerful guarantee because it is basically a type of promise that is made to you that the data are going be used as part of statistical models, et cetera, but they’re never going to be linked back to you.

The challenge lies in the way we go about achieving this in practice. Deidentification techniques and principles such as K-anonymity are (unfortunately) often considered a good way of protecting privacy. These techniques which, basically take a given dataset and modify it in one way or another, might have been considered robust enough when they were invented in the 90s and 2000s, but because of the world we live in today and the amount of data available about every single person in those datasets, they basically fall short.

There is a need for a real paradigm shift in terms of what we are using and there are a lot of good techniques out there. Fundamentally, the question comes down to what is necessary for you to make sure that the promise of anonymisation holds true, now and in the future.

Could you talk to me a little bit about the Observatory of Anonymity and what this project set out to achieve? And then as a second part of that question, are there any new projects that you’re currently working on?

YD: The Observatory of Anonymity comes from a research project published by a former postdoc of mine, and the idea is basically to demonstrate with very specific examples to people, how little it takes to potentially reidentify someone.

Fundamentally, you could spend time trying to write down the math to make sense of why you know for a certain number of reasons that a handful of pieces of information are going to be sufficient in linking back to you. The other option is to look at an actual model of the population of the UK. As a starting point, we know that there are roughly 66 million people living across the country. Even if you take London, there are still 10 million of us. And yet, as you start to focus on a handful of characteristics, you begin to realise very quickly, that those characteristics, when put together, are going to make you stand out and a significant fraction of the time that can mean that you will be the only person in all of the country to match those sets of characteristics.

The interesting part is what do we do?

You’re working within Research Data Management and your team are increasingly dealing with sensitive data and the question of how they can be safely shared?
Clearly there are huge benefits to data being shared in science, in terms of verifying research findings and reproducing the results and so on. The question is how do we go about this? What meaningful measures can you put in place to ensure that you sufficiently lower the risks of harmful disclosure in such a way that you know that the benefit of showing these data will clearly outweigh those risks?

I think from our perspective, it’s really about focusing on supporting modern privacy-ensuring approaches that are fit for purpose. We know that there are a range of techniques; from controlled access to query-based systems, to some of the encryption techniques that, depending on the use case, who needs to access your data, and the size of your dataset would allow someone to use your data, run analysis, and replicate your results fully without endangering people’s privacy. For us, it’s about recognising the right combination of those approaches and how we develop some of these tools and test them.

I think there has been a big push towards open data, under the de-identification model, for very good reasons. But this should continue to be informed by considerations around appropriate modern tools, to safeguard data while preserving some utility. Legally at least, you cannot not care about privacy and if you want to care about privacy properly, this will affect the utility. So we need to continue to handle questions around data sharing on a case-by-case basis rather than imply that everything should be fully open all of the time. Otherwise this will be damaging to the sciences and to privacy.

Yes, it is important to acknowledge that that tension between privacy and utility of research data exists and that a careful balance needs to be struck but this may not always be possible to achieve. This is something that we try to communicate in our training and advocacy work within Research Data Management services.
We have adopted a message that can hopefully be helpful (and which originated from Horizon Europe[1]), which states that open science operates on the principle of being ‘as open as possible, as closed as necessary’. In practice this means that results and data may be kept closed if making them open access is against the researcher’s legitimate interests or obligations to personal data protection. This is where a mechanism such as controlled access could play a role.

YD: Just so. I think you guys have quite a unique role to play. A controlled access mechanism that allows a researcher to run some code on someone else’s data without seeing the data on the other hand requires systems of management, authorisation and verification of users, et cetera. This is simply out of the reach of many individual researchers. As a facility or as a form of infrastructure however, this is actually something that isn’t too difficult to provide.
I think France has something called the CASD, which is the Center for Secure Access to Data (or Centre d’Accès Sécurisé aux Données) and this is how the National Institute of Statistics and Economic Studies (INSEE) is able to share a lot of sensitive data. Oxford’s OpenSAFELY in the UK is another great example of this. They are ahead in this regard. We need similar mechanisms when it comes to research data to facilitate replicability, reuse and for validating and verifying results. It is absolutely necessary. But we need proper tools to do this and it’s something that we need to tackle as a collective. No individual researcher can do this alone.

What in your experience are common misconceptions around anonymisation in the context of research data?

YD: I think the most common misunderstanding is a general underestimation of the scale of data already available. Concerns often revolve around a notion of, could someone search another person’s social media and deduce a piece of information to reidentify them in my medical dataset? In the world of big data, I would argue that what we strive to protect against also includes far stronger threat models than this.
We had examples in the US in which you had right wing organisations with significant resources buying access to location data, matching them manually, potentially at scale with the travel record, and other pieces of information they could find about clerics to potentially identify them in this dataset, in an attempt to see if anyone was attending a particular seminar[2].

We had the same with Trump’s tax record. Everyone was searching for the tax record and it turns out that it was available as part of an ’anonymous’ dataset, made available by the IRS and again these were data that were released years and years ago.
They remained online and then suddenly they’re an extremely sensitive set of information that you can no longer meaningfully protect.

This goes back to what you were saying again about anticipating how certain techniques could be used in the future to potentially exploit these data.

YD: Actually, on this precise point, we know from cryptography that good cryptographic solutions are actually fully open and that the cryptographic solution is solid. I can describe to you the entire algorithm. I can give you the exact source code. The secrecy is protected by the process but the process itself is fully open.
If the security depends on the secrecy of your process, often you’re in trouble, right? And so a good solution actually doesn’t rely on you hiding something, something being secret, or you hoping that someone is not going to figure something out. And I think that this is another very important aspect.

And this perhaps goes back again, to the type of general misunderstandings which sometimes arise where someone might assume that because some data have to be kept private, as you were saying, that the documentation behind the process of ensuring that security also has to be kept private, when in fact you need open community standards that can be scrutinised and that people can build upon and improve. This is very relevant to our work in supporting things like data management plans, which require clear documentation.

We have reached our final question: There is arguably a tendency to focus on data horror stories to communicate the limitations of anonymisation (if applied for example without a proportionate risk-based approach for a research project). Are there positive messages we can promote when it comes to engaging with good or sensible practice more broadly?

YD: In addition to being transparent about developing and following best practices as we have just talked about, I think there needs to be more conversations around infrastructure. To me, it is not about someone coming up with and deploying a better algorithm.
We very much need to be part of an infrastructure building community that works together to instill good governance.

There are plenty of examples already in existence. We worked, for example, a lot on a project called Opal which is a great use case of how we can safely share very sensitive data for good. I think OpenSAFELY is another really good case study from Oxford and the CASD in in France as I already mentioned.
These case studies offer very pragmatic solutions, but are an order of magnitude better, both from the privacy and the utility side, than any existing legacy solutions that I know of.

[1] https://rea.ec.europa.eu/open-science_en

[2] https://www.washingtonpost.com/religion/2021/07/21/catholic-official-grindr-reaction/

Useful links:

CASD

https://www.casd.eu/en/le-centre-dacces-securise-aux-donnees-casd/gouvernance-et-missions/

CPG

https://cpg.doc.ic.ac.uk/

Introduction to research data management

https://www.imperial.ac.uk/research-and-innovation/support-for-staff/scholarly-communication/research-data-management/introduction-to-research-data-management/

Opal project

https://www.opalproject.org/

OpenSAFELY

https://www.opensafely.org/about/

Open Science – European Commission

https://rea.ec.europa.eu/open-science_en

Open Access Week 2023: Imperial’s Research Publications Open Access Policy

David Phillips

27 October 2023

This post was written by Ruth Harrison, Head of Scholarly Communications Management at Imperial College London.

After many years of work, the College will soon be able to announce that we are updating our institutional open access policy to allow researchers to make their peer-reviewed journal articles and conference proceedings available on open access under a CC BY licence at the point of publication with no embargo. This will apply to accepted manuscripts, and enable staff and students to retain their right to reuse the content of those outputs in teaching, research and further sharing of their work.

Why?

I don’t think many people would disagree with the moral and ethical case for open access to research, and that the principles of open research should be more widely applied. This is a global endeavour – in 2022, UNESCO published its recommendation on Open Science stating:

“By promoting science that is more accessible, inclusive and transparent, open science furthers the right of everyone to share in scientific advancement and its benefits as stated in Article 27.1 of the Universal Declaration of Human Rights.”

Open access publishing has existed for more than two decades now, and in the past 10 years, funders have increasingly required open access to the published outputs of research which public money, ultimately, has enabled. In the UK (and internationally) this has resulted in various policies which researchers, libraries and publishers have had to keep track of, and there are now many models through which open access can be achieved. But this also means considerable ‘policy stack’ and confusion, with varying workflows and messaging for researchers to keep up with.

Introducing a policy through which author rights to their accepted manuscript are retained is a solution to the policy stack. Based on the lead taken by MIT with their open access policy, introduced over a decade ago, and other institutions around the world, within the UK the case has been made that we should adopt the same approach. At Imperial, this began with the introduction of the concept of the UK-SCL – Scholarly Communications Licence – and has now developed into what will be our Research Publications Open Access Policy (RPOAP). Generally such policies are referred to as rights retention policies or strategies, and we will join over 20 other UK universities who have already implemented similar policies, including the universities of Edinburgh, Cambridge, Oxford and Glasgow, as well as Sheffield Hallam, Swansea, Queen’s University Belfast and the N8 institutions.

How does a rights retention policy work?

There are some key points to make:

Authors will retain copyright over their work
Under the policy, each author grants the College a non‐exclusive, irrevocable, sub-licensable, worldwide licence (effective from acceptance of publication) to make the AAM author accepted manuscript publicly available under the terms of a Creative Commons Attribution (CC BY) licence
The right being granted is that of allowing the College to make the accepted manuscript openly available in Spiral without an embargo
The College does not retain the copyright to research outputs – that is waived in favour of academics
The policy applies to peer-reviewed journal articles and conference proceedings
There is no restriction on choosing where to publish.

For the policy to be effectively implemented:

Publishers need to be informed when an institution is going to implement a rights retention policy
On behalf of all staff and students, the College will notify publishers of the policy
There will be a list available of notified publishers.

What will authors need to do?

Authors should continue to upload their accepted manuscripts to Symplectic Elements which means for many people, there will be no change in their workflow at acceptance. When an accepted manuscript is received, the Library Services open access team will process it including managing any accompanying APC (article processing charge) application.

We would recommend that authors:

familiarise themselves with the RPOAP when it is published
consult the list of notified publishers when they are preparing a manuscript for submission – this will be available in the next few weeks

use our publisher agreements search tool to find out if the Library Services has covered the cost of open access publishing for the version of record
upload their accepted manuscripts (or a link to where a copy is already deposited, such as arXiv or another institutional repository) as soon as they can after acceptance

What’s next?

When the policy implementation date is agreed by University Management Board, there will be further communications across College, contact information and guidance available online at the Scholarly Communication website. This will include the list of notified publishers, and advice on what to do if your intended publisher is not on that list. And it is not only staff who will be able to take advantage of the policy, students are included as well – if you are a student publishing a journal article or conference paper, you will grant and retain the same rights as outlined above.

In the spirit of this year’s International Open Access Week theme, Community over Commercialisation, the ultimate question is: who decides? Should publishers get to decide what research readers see and what they can do with it, or should it be for the research community to decide for itself? RPOAP answers the question in favour of the community.

The changing state of Gold Open Access at Imperial

David Phillips

28 October 2022

Publisher Agreements

As was highlighted by Imperial’s Director of Library Services Chris Banks in her blog post earlier in this International Open Access Week 2022, the past few years have seen a rapid increase in the number of publisher agreements that Imperial College has signed up to. We now have 33 agreements in place that allow for open access (OA) fees to be fully covered for corresponding authors affiliated with imperial College London at no further cost.

This has unsurprisingly led to a significant increase in the number of papers being made OA through such agreements. The below graph shows the number of papers covered over the last year via four of the most used Read & Publish agreements that we currently have:

*Imperial papers made OA through publisher agreements (1 Oct 2021 – 30 Sep 2022)*

This adds up to almost 1000 OA papers from these four agreements alone, which does not include the figures from other publishers we have agreements with such as SAGE, Oxford University Press, Taylor & Francis, and Cambridge University Press.

A shift away from individual APC payments?

As was predicted in an earlier blog post from OA Week 2020, the number of papers now being covered through publisher agreements has now overtaken the number of individual Article Processing Charges (APCs) that we pay for from the OA funds that we administer. For the period from 1 October 2021 to 30 September 2022 we paid for a total of 759 APCs, compared to well over 1000 covered through the agreements.

While we have only seen a slight drop in the total number of individual APCs paid for compared to last year, the most significant change has been an ongoing reduction in the number of APCs we have paid for papers in hybrid journals specifically (i.e. subscription journals that have an OA option) as shown in the below graph:

*Individual APCs paid for from OA funds*

This reduction in individual payments for APCs in hybrid journals should not be attributed to the increase in publisher agreements alone, as changes to funder policies in recent years have also introduced tighter restrictions on hybrid APC payments, and have offered authors alternative routes to compliance via the green OA route through rights retention. However, it is certainly one of the main reasons behind this shift and is a desired outcome in the transition away from a publishing model that allowed for ‘double-dipping’.

Imperial Open Access Fund

As most publisher agreements do not require authors to be funded, they have allowed many papers to be made OA via the gold route that would otherwise not have been eligible. As well as our funder OA block grants, we are also fortunate to be able to offer our authors the Imperial Open Access Fund. This is available for those without alternative funds available, and can be used to pay APCs for original research papers in fully OA journals listed in the Directory of Open Access Journals.

Although some of our publisher agreements do cover fully OA as well as hybrid journals (e.g. Wiley’s), most of them do not, and there are many publishers who exclusively offer fully OA journals with compulsory APCs. This means the Imperial OA Fund continues to have a big part to play in enabling our authors to publish OA and covered 363 APCs in the last year (nearly half of the total amount):

*APCs paid for by each fund (1 Oct 2021 – 30 Sep 2022)*

For details on Imperial’s current publisher agreements, please see our newly revamped Publisher agreements and discounts page, and for details on our OA funds and how Imperial authors can apply for APC funding please see our Applying for funding page.

No double dipping! The rise of transformative publisher agreements in the transition to full Open Access

David Phillips

20 October 2020

The impact of Plan S

In 2018 a group of funders and national research agencies launched Plan S, an initiative with the central aim that by January 2021 “…all scholarly publications on the results from research funded by public or private grants provided by national, regional and international research councils and funding bodies, must be published in Open Access Journals, on Open Access Platforms, or made immediately available through Open Access Repositories without embargo.” Implicit in this goal is the intention of funders to move away from supporting the ‘hybrid’ model of publishing, whereby journals offer a paid open access (OA) option for authors to make their paper freely available upon publication but continue to charge a subscription fee for the rest of their content.

As with many other institutions, at Imperial we are recipients of block grants from certain funders, which authors acknowledging support from those funders can use to pay for individual Article Processing Charges (APCs) in both fully OA and hybrid journals. Although we have already introduced some restrictions on when we will pay for hybrid APCs, due to limited funds, with funders increasingly adopting the Plan S Principles authors may be concerned that they will soon be completely prevented from choosing OA publishing options in hybrid journals.

This is where Plan S Principle 8 comes in, which states that “…as a transitional pathway towards full Open Access within a clearly defined timeframe, and only as part of transformative arrangements, Funders may contribute to financially supporting such arrangements”. So, while Plan S funders will no longer support the payment of individual APCs to hybrid journals, institutions are able to redirect OA funds to pay for arrangements with publishers to transition away from the hybrid model towards being fully OA (until the end of 2024).

Read & Publish agreements

There are several types of transformative arrangements, but perhaps the most common are Read & Publish agreements. Instead of institutions (generally via their libraries) paying separately for subscriptions and OA fees for the same journals (aka ‘double-dipping’), Read & Publish agreements combine the costs. This provides those affiliated with the institution access to journal content that is still paywalled, as well as allowing authors to choose the OA option for their publications at no further cost.

As more of the content in hybrid journals becomes free for all to read in the transition to becoming fully OA, the proportion paid for the ‘Read’ part of the deal will decrease, and the proportion paid for the ‘Publish’ part will increase accordingly. While these kinds of arrangements precede the announcement of Plan S, their uptake has undeniably been accelerated by the initiative. Prior to 2020 Imperial had signed up to one Read & Publish agreement (with Springer in 2016), but we now have 11 In place, all negotiated by Jisc for Imperial and other institutions.

just take one dip and end it, Peyri Herrera, CC BY-ND 2.0, https://www.flickr.com/photos/54552940@N00/2483791713

Read & Publish agreements can offer an alternative route for authors to publish their work OA in cases where we would normally not be able to provide funding for an APC. Unlike our OA block grants from funders, which only authors acknowledging the relevant funding can use, these agreements can be made available to all Imperial staff and students (usually with the requirement that they are the corresponding author). The process should generally be much quicker and easier for authors, as they do not need to request an invoice or make a separate payment for an APC, and publishers have also been encouraged to improve the workflows and dashboards used by authors and the staff who administer the agreements within institutions.

Not a panacea

However, it can be argued that such agreements do not solve all of the problems that are present in the existing hybrid OA model. To the authors that are eligible for these agreements it may feel that they are getting free and unlimited OA for their work, but there are still high costs involved to sign up for the deals in the first place, and often there are limits on how many papers can be made OA in a year. This has recently been seen with the restrictions introduced to the Wiley agreement, whereby only authors supported by certain funders are currently eligible for inclusion in the agreement due to high levels of demand.

During an OA Week with a theme of “Taking Action to Build Structural Equity and Inclusion”, it is also important to highlight that such agreements can be seen as perpetuating global inequalities in access to OA publishing, as is argued by Jefferson Pooley on the LSE Impact Blog. A transition away from the hybrid model towards journals being fully OA should benefit everyone wanting to access the outputs of research as a reader. Nevertheless, it is only those authors who are affiliated with institutions wealthy enough to pay for the agreements (predominantly research intensive and in the global North) who are in a position to directly benefit from the OA publishing aspect.

Others who wish to publish OA will continue needing to find alternative routes, such as applying for APC waivers, submitting to OA journals that do not charge APCs, or self-archiving. This is not to say that these other routes are not valid – the option to self-archive (aka ‘green’ OA) is also a key part of the Plan S principles – but for those authors who do not have ready access to APC funds or publisher agreements there is understandably a sense of inequality.

This diagram by Imperial’s Director of Library Services, Chris Banks, demonstrates the complexity of a transition to full OA when considering the different levels of research intensity across institutions
(https://twitter.com/ChrisBanks/status/1169530088276340736)

A shift in gold OA at Imperial?

At Imperial we are fortunate to be able to offer our authors a range of different ways to make their research outputs OA, via both the green and gold routes. While the majority of our time (and money) in the gold section of the OA Team is still spent on paying individual APC payments from the funds that we administer (totalling 853 payments from 1 Oct 2019 – 30 Sep 2020), an increasing number of articles are now being made OA through our aforementioned Read & Publish agreements.

Imperial papers made OA through Read & Publish agreements (1 Oct 2019 – 30 Sep 2020)

The graph above shows the numbers of papers made OA via our four most used agreements (with Springer, Wiley, the Royal Society of Chemistry and SAGE) totalling 567 papers between 1 Oct 2019 – 30 Sep 2020. We also have agreements in place with the Company of Biologists, European Respiratory Society, IOP, IWA, Microbiology Society, Portland Press and Thieme. As previously mentioned, only the Springer agreement was in place prior to 2020, and we are in the process of signing more agreements. We would therefore expect the figures for next year to be even higher, and to perhaps even overtake the number of APCs we pay for individually.

For details on Imperial’s current Read & Publish agreements, as well as other publisher arrangements and discounts available to Imperial authors, please see our Publisher agreements and discounts page.

Slowing down the Gold Rush: a community resource to keep track of expensive APCs

David Phillips

26 October 2018

This is the fourth of a series of blog posts by Imperial’s Open Access Team for OA Week. Please also see our blog post on Publisher Problems, our blog post on Accepted Manuscript definitions, and our blog post on Publisher Contacts.

This blog post is directed to our Open Access colleagues in Higher Education.

The rising price of Gold OA

A big part of what OA Teams in libraries/research offices do – in those institutions that are fortunate enough to have the funding – is make decisions on which publications can (or need to) be published via the Gold OA route. As we diligently work away to process the scores of article processing charge (APC) applications we receive each month, it can sometimes be easy to lose sight of what we are actually authorising each time we approve an application: namely, the payment of thousands of pounds of taxpayers’/charities’/institutions’ money to (often exceptionally profitable) publishers.

A recent survey of authors around the world found that many had never published OA, but for 27% of them this was because they could not afford the APCs required to do so. The cost of Gold OA has been rising beyond the rate of inflation for many years now (as reported by Jisc in 2016 and in Universities UK in 2017), and although funders have increased the amounts given to institutions to pay for APCs, it is becoming increasingly difficult to meet the demand from authors to publish their work OA.

At Imperial College we are lucky to be the recipients of generous block grants from the Research Councils (RCUK – now UKRI) and the Charity Open Access Fund (COAF) to help our authors meet their OA requirements, as well as having access to an institutional fund to pay for APCs in fully OA journals. However, these funds are not bottomless, and can only stretch so far in the face of rising APCs and increasing demand from authors who are publishing more and more. Indeed, we have very recently realised that our RCUK grant is close to running out, and we will be need to be much more restrictive in how we use that fund to pay for APCs for the foreseeable future. This blog post from the Office of Scholarly Communication at Cambridge clearly demonstrates the issues faced in trying to use OA funds in a sustainable way.

The Gold route is of course not the only way authors can make their work OA (and does not always require an APC). When funds run low we can use this as an opportunity to advise how the Green route can meet funders’ and REF requirements, and to promote the benefits of our institutional repository. However, what we aim to offer is a fair and consistent service to our authors, and this is difficult when we cannot be sure how long our funds will last, and whether or not we will be able to approve APC applications from one month to the next.

With the announcement by a consortium of European funders of Plan S (with a key change that hybrid open-access journals are not compliant with their key principles) and rumours of imminent changes to research funders’ open access policies in the UK (e.g. in the upcoming Wellcome OA Policy Review), there is hope that the unsustainable model of increasingly expensive Gold OA will be curtailed. It is important to recognise that the cost of APCs is not the only thing we should be considering, but also the approach that publishers are taking towards a transition to OA (through their self-archiving embargo policies, for example), as is acknowledged in Cambridge’s new policy.

Other institutions (such as LSHTM and Bath) have also already introduced steps to prolong and distribute their OA funds in different ways, by introducing extra conditions such as caps on APC costs and restricting which types of hybrid journal they will pay for. Although at Imperial we have not yet introduced a cap for the APCs we will pay, this is something that is likely to be rolled out by funders in the near future, so we think it is important to record the APCs we have paid for already that were particularly costly.

Recording expensive APCs

Connected to the work done by my OA Team colleague Danny Smith in his Publisher Problems spreadsheet another sheet was created to record particularly expensive APCs. This sheet has been populated with examples of APCs paid for by the Imperial OA Team in 2018, where the cost was £3,000 or over (before VAT), and is now available at the following link:

Go to Expensive APCs spreadsheet

How APC costs are calculated and justified by publishers is a contentious issue, as argued by recent Imperial alumnus Jon Tennant in his blog post: “Why the term ‘Article Processing Charge’ (APC) is misleading”. The aforementioned potential caps on APCs from funders are yet to be announced, and in the meantime it is difficult to set an exact figure for what is an “expensive” APC. However, for the purposes of the resource being discussed, this figure reflects what we consider to be a significantly higher amount than the average cost of an APC (calculated as £2,269 in the Wellcome’s 2016/17 report).

Screenshot of the Spreadsheet for Most Expensive APCs

This is by no means an exhaustive list of all journals that would fit within the cost criteria, as it only includes APCs we have paid for at Imperial in 2018, and may miss those journals where we have received a discount that reduced the end cost below the threshold. Although we have paid for APCs for multiple articles in many of the journals included, we have included one example article for each to avoid duplication. We would like this to be a shared resource so we would encourage members of the community to add their own examples from different journals. So far the sheet includes examples of articles published in 39 different journals, from 10 publishers, with a total net cost of £137,609.17 (see table below). More detailed data on APC payments is available through the various reports that institutions produce (e.g. for Jisc).

Publisher/Journal	APC Cost (excl. VAT)
American Association for the Advancement of Science (total)	£3,508.70
Science Advances	£3,508.70
American Chemical Society (total)	£32,922.75
ACS Applied Materials and Interfaces	£3,049.24
ACS Chemical Biology	£3,630.05
ACS Nano	£3,049.24
ACS Photonics	£3,005.92
ACS Synthetic Biology	£3,630.05
Chemical Research in Toxicology	£3,787.00
Chemical Reviews	£3,029.60
Chemistry of Materials	£3,634.41
Journal of Chemical Information and Modeling	£3,077.64
Macromolecules	£3,029.60
American Heart Association (total)	£7,090.52
Circulation	£3,616.23
Hypertension	£3,474.29
Elsevier (total)	£34,223.11
Current Opinion in Environmental Science & Health	£3,023.60
Current Opinion in Structural Biology	£3,271.28
European Urology	£3,907.51
Fuel	£3,034.82
International Journal of Biochemistry and Cell Biology	£3,019.27
Journal of Cleaner Production	£3,139.53
Journal of Power Sources	£3,077.64
Lancet Infectious Diseases	£3,907.50
Lancet Public Health	£3,934.46
The Lancet Haematology	£3,907.50
Elsevier (Cell Press) (total)	£24,062.69
Cancer Cell	£4,031.36
Cell Reports	£3,970.34
Cell Systems	£3,934.46
Current Biology	£4,031.36
Molecular Cell	£4,031.36
Structure	£4,063.81
EMBO Press (total)	£4,200.00
The EMBO Journal	£4,200.00
Nature Publishing Group (total)	£3,300.00
Nature Communications	£3,300.00
Oxford University Press (total)	£4,228.53
Journal of the Endocrine Society	£4,228.53
Rockefeller University Press (total)	£3,811.55
Journal of Cell Biology	£3,811.55
Wiley (total)	£20,261.32
Advanced Functional Materials	£3,750.00
Advanced Materials	£3,750.00
American Journal of Transplantation	£3,010.00
Angewandte Chemie	£3,537.32
Clinical and Experimental Allergy	£3,000.00
Diabetes, Obesity and Metabolism	£3,214.00
Total	£137,609.17

As identified in the Publisher’s Problems spreadsheet there are many factors that can make the process of paying for an APC unnecessarily complicated. One issue that the Expensive APCs sheet has further highlighted is the confusion and variations in price that can arise from APCs being advertised, invoiced and paid in different currencies. We have also included a column to identify those publishers who (often confusingly) separate out the cost for a “standard” APC and additional charges for CC BY licenses (including an eye-watering example of this where $3000 was paid just for CC BY). Other potential areas for discussion are the differences between APCs for open access and hybrid journals, and the value and impact of discounts/offsetting.

While we should recognise that much progress has been made by the OA movement in disrupting and reshaping traditional academic publishing models, there is still much work to be done, as is passionately argued in the documentary Paywall: The Business of Scholarship which has received many screenings in OA Week. It is hoped that this spreadsheet will be useful as a way of not only identifying those publishers that are currently charging seemingly excessive amounts, but also monitoring change over time and (hopefully!) a transition away from rising costs. There is also the potential to use the examples to help authors make educated choices about where they publish, and increase their awareness of the charges levied.

We plan to add a link to the sheet (and the other resources we have shared) on the forthcoming UKCORR resources page. Please go ahead and start editing/adding your own examples (checking the notes and instructions first), and we welcome any feedback for how these resources can be improved and best used.