When you come at it for the first time, open access looks pretty complicated. Funder policies, institutional policies, publisher policies, different flavours of OA including ‘green’, ‘gold’, ‘libre’ and ‘gratis’ and a whole new language with mystifying terms like ‘hybrid journal’, ‘article processing charge’ and ‘author accepted manuscript’ await. Even librarians sometimes struggle to understand journal policies, or what certain licensing conditions actually mean.
It was perhaps for this reason that, when we started the College open access project, academics gave us a clear mission: a one button solution to open access.
We haven’t quite achieved that yet, but since May we are running a new workflow that reduces the complexity to one sentence: ‘When you have a paper accepted, deposit the peer-reviewed manuscript – we do the rest, no matter what type of open access.’
The workflow is based on two ideas:
- Ask authors for the minimum information required.
- Offer authors a single publications workflow that covers green and gold OA as well information required for funder reporting.
The frontend for this workflow is Symplectic Elements, the system used by our academics to manage their scholarly outputs. We have worked with the vendor to deliver an OA workflow that kicks in on acceptance for publication, and then we customised the system to interface with ASK OA, our in-house APC management system.
On acceptance for publication, authors add minimal metadata and the manuscript to Elements, link the article to relevant grants and if they want the College to pay an open access charge they simply tick a box. Colleagues in the Library’s open access team then check the submission, set necessary embargoes and make the output available through Spiral, the College repository. If payment is requested, the data is automatically transferred to ASK OA, the cloud-based, workflow-driven system that we launched last year. Through that process, authors receive a purchase order number to send to their publisher. When the College receives the electronic invoice, our finance system matches the PO and the payment process starts. No author interaction needed.
Above you see a screenshot of the information we require from authors. In addition, they deposit the manuscript (or share a link if it was already deposited in an external repository) and link the output to relevant grants. That allows us to charge costs for open access publishing to the correct funders and, once funder systems are ready, will enable the College to automate funder reporting on research outputs. If you want to see a demonstration, check out this video guide produced by the College Library:
The feedback we had from academics has been positive so far, and the numbers show that as well:
While the workflow is working well so far, we are still far away from what I would consider the ideal scenario. There are still enough journals with difficult and unhelpful policies, and no university workflow will be able to fix that. Publishers being unable to issue correct invoices is another issue. We also have the problem to reliably match the metadata entered on acceptance with the metadata for the published output. Publishers could help by issuing authors with a DOI on acceptance.
Even better, publishers could feed publication metadata into systems like CrossRef on the date of acceptance. If the metadata had funder, licence and embargo information attached and a link to the manuscript, then open access would indeed become a one-click-problem. Authors enter their data on submission, and following acceptance it automatically travels through all relevant systems, until it ends up in an institutional repository. There would be no additional effort for authors, and admin overhead would be reduced greatly. The components to enable this already exist, for example the author identifier ORCID that was rolled out across the College last year.
We are still working towards the goal of a “one button” solution for open access with our partners. Until then the message remains: deposit the manuscript on acceptance, we do the rest.
The library has released a new workflow on how to make your publications REF compliant. Authors can now deposit their journal articles and conference proceedings on acceptance in Spiral via Symplectic. At the same time an application can be made for APC funding to pay open access fees.
Monday 9 – Thursday 12 February 2015 saw data management and curation professionals and researchers descend on London for the 10th annual International Digital Curation Conference (IDCC), held at 30 Euston Square. Although IDCC is focussed on “digital curation”, in recent years it has become the main annual conference for the wider research data management community.
This year’s conference theme was “Ten years back, ten years forward: achievements, lessons and the future for digital curation”.
Day 1: Keynotes, demos and panel sessions
Tony Hey opened the conference with an overview of the past 10 years of e-Science activities in the UK, in highlighting the many successes along with the lack of recent progress in some areas. Part of the problem is that the funding for data curation tends to be very local, while the value of the shared data is global, leading to a “tragedy of the commons” situation: people want to use others’ data but aren’t willing to invest in sharing their own. He also had some very positive messages for the future, including how a number of disciplines are evolving to include data scientists as an integral part of the research process:
Next up was a panel session comparing international perspectives from the UK (Mark Thorley, NERC), Australia (Clare McLaughlin, Australian Embassy and Mission to the EU) and Finland (Riita Maijala, Department for HE and Science Policy, Finland). It was interesting to compare the situation in the UK, which is patchy at best, with Australia, which has had a lot of government funding in recent years to invest in research data infrastructure for institutions and the Australian National Data Service. This funding has resulted in excellent support for research data within institutions, fully integrated at a national level for discovery. The panel noted that we’re currently moving from a culture of compliance (with funder/publisher/institutional policies) to one of appreciating the value of sharing data. There was also some discussion about the role of libraries, with the suggestion that it might be time for academic librarians to go back to an earlier role which is more directly involved in the research process.
After lunch was a session of parallel demos. On the data archiving front, Arkivum’s Matthew Addis demonstrated their integration with ePrints (similar workflows for DSpace and others are in the works). There was also a demo of the Islandora framework which integrates the Drupal CMS, the Fedora Core digital repository and Solr for search and discovery: this lets you build a customised repository by putting together “solution packs” for different types of content (e.g. image data, audio, etc.).
The final session of the day was another panel session on the subject of “Why is it taking so long?”, featuring our own Torsten Reimer alongside Laurence Horton (LSE), Constanze Curdt (University of Cologne), Amy Hodge (Stanford University), Tim DiLauro (Johns Hopkins University) and Geoffrey Bilder (CrossRef), moderated by Carly Strasser (DataCite). This produced a lively debate about whether the RDM culture change really is taking a long time, or whether we are in fact making good progress. It certainly isn’t a uniform picture: different disciplines are definitely moving at different speeds. A key problem is that at the moment a lot of the investment in RDM support and infrastructure is happening on a project basis, with very few institutions making a long-term commitment to fund this work. Related to this, research councils are expecting individual research projects to include their own RDM costs in budgets, and expecting this to add up to an infrastructure across a whole institution: this was likened to funding someone to build a bike shed and expecting a national electricity grid as a side effect!
There was some hope expressed as well though. Although researchers are bad at producing metadata right now, for example, we can expect them to get better with practice. In addition, experience from CrossRef shows that it typically takes 3–4 years from delivering an infrastructure to the promised benefits starting to be delivered. In other words, “it’s a journey, not a destination”!
Day 2: research and practice papers
Day 2 of the conference proper was opened by Melissa Terras, Director of UCL Centre for Digital Humanities, with a keynote entitled “The stuff we forget: Digital Humanities, digital data, and the academic cycle”. She described a number of recent digital humanities projects at UCL, highlighting some of the digital preservation problems along the way. The main common problem is that there is usually no budget line for preservation, so any associated costs (including staff time) reduce the resources available for the project itself. In additional, the large reference datasets produced by these projects are often in excess of 1TB. This is difficult to share, and made more so by the fact that subsets of the dataset are not useful — users generally want the whole thing.
The bulk of day 2, as is traditional at IDCC, was made up of parallel sessions of research and practice papers. There were a lot of these, and all of the presentations are available on the conference website, but here are a few highlights.
Some were still a long way from implementation, such as Lukasz Bolikowzki’s (University of Warsaw) “System for distributed minting and management of persistent identifiers”, based on Bitcoin-like ideas and doing away with the need for a single ultimate authority (like DataCite) for identifiers. In the same session, Bertram Ludäscher (University of Illinois Urbana-Champaign) described YesWorkflow, a tool to allow researchers to markup their analysis scripts in such a way that the workflow can be extracted and presented graphically (e.g. for publication or documentation).
Daisy Abbot (Glasgow School of Art) presented some interesting conclusions from a survey of PhD students and supervisors:
- 90% saw digital curation as important, though 60% of PhD holders an 80% of students report little or no expertise
- Generally students are seen as having most responsibility for managing thier data, but supervisors assign themselves more of the responsibility than the students do
- People are much more likely to use those close to them (friends, colleagues, supervisors) as sources of guidance, rather than publicly available information (e.g. DCC, MANTRA, etc.)
In a packed session on education:
- Liz Lyon (University of Pittsburgh) described a project to send MLIS students into science/engineering labs to learn from the researchers (and pass on some of their own expertise);
- Helen Tibbo (University of North Carolina) gave a potted history of digital curation education and training in the US; and
- Cheryl Thompson (University of Illinois Urbana-Champaign) discussed their project to give MLIS students internships in data science.
To close the conference proper, Helen Hockx-Yu (Head of Web Archiving, British Library) talked about the history of web archiving at the BL and their preparation for non-print legal deposit, which came into force on 6 April 2013 through the Legal Deposit Libraries (Non-Print Works) Regulations 2013. They now have two UK web archives:
- An open archive, which includes only those sites permitted by licenses
- The full legal deposit web archive, which includes everything identified as a “UK” website (including `.uk’ domain names and known British organisations), and is only accessible through the reading room of the British Library and a small number of other access points.
Software Carpentry is a community-developed course to improve the software engineering skills and practices of self-taught programmers in the research community, with the aim of improving the quality of research software and hence the reliability and reproducibility of the results. Data Carpentry is an extension of this idea to teaching skills of reproducible data analysis.
One of the main aims of a Data Carpentry course is to move researchers away from using ad hoc analysis in Excel and towards using programmable tools such as R and Python to to create documented, reproducible workflows. Excel is a powerful tool, but the danger when using it is that all manipulations are performed in-place and the result is often saved over the original spreadsheet. This both destroys (potentially) the raw data without providing any documentation of what was done to arrive at the processed version. Instead, using a scripting language to perform analysis enables the analysis to be done without touching the original data file while producing a repeatable transcript of the workflow. In addition, using freely available open-source tools means that the analysis can be repeated without a need for potentially expensive licenses for commercial software.
The Data Carpentry workshop on Wednesday offered the opportunity to experience Data Carpentry from three different perspectives:
- Workshop attendee
- Potential host and instructor
- Training materials contributor
We started out with a very brief idea of what a Data Carpentry workshop attendee might experience. The course would usually be run over two days, and start with some advanced techniques for doing data analysis in Excel, but in the interest of time we went straight into using the R statistical programming language. We went through the process of setting up the R environment, before moving on to accessing a dataset (based on US census data) that enables the probability of a given name being male or female to be estimated.
The next section of the workshop involved a discussion of how the training was delivered, during which we came up with a list of potential improvements to the content. During the final part, we had an introduction to github and the git version control system (which are used by Software/Data Carpentry to manage community development of the learning materials), and then split up into teams to attempt to address some of our suggested improvements by editing and adding content.
I found this last part particularly helpful, as I (in common with several of the other participants) have often wanted to contribute to projects like this but have worried about whether my contribution would be useful. It was therefore very useful to have the opportunity to do so in a controlled environment with guidance from someone intimately involved with the project.
In summary, Data Carpentry and Software Carpentry both appear to be valuable resources, especially given that there is an existing network of volunteers available to deliver the training and the only cost then is the travel and subsistence expenses of the trainers. I would be very interested in working to introduce this here at Imperial.
Jisc Research Data Spring
Research Data Spring is a part of Jisc’s Research at Risk “co-design” programme, and will fund a series of innovative research data management projects led by groups based in research institutions. This funding programme is following a new pattern for Jisc, with three progressive phases. A set of projects will be selected to receive between £5,000 and £20,000 for phase 1, which will last 4 months. After this, a subset of the projects will be chosen to receive a further £5,000 – £40,000 in phase 2, which lasts 5 months. Finally, a subset of the phase 2 projects will receive an additional £5,000 – £60,000 for phase 3, lasting 6 months. You can look at a full list of ideas on the Research At Risk Ideascale site: these will be pitched to a “Dragon’s Den”-style panel at the workshop in Birmingham on 26/27 February.
The Research Data Spring workshop on Thursday 12 February was an opportunity to meet some of the idea owners and for them to give “elevator pitch” presentations to all present. There was then plenty of time for the idea owners and other interested people to mingle, discuss, give feedback and collaborate to further develop the ideas before the Birmingham workshop.
Ideas that seem particularly relevant to us at Imperial include:
- Open Source Database-as-a-Service with Data Publishing
- A system to make it easier for researchers who currently use Microsoft Access or Excel to move their data to a robust relational database management system and share that data with collaborators.
- Computational experiments as data objects
- Packaging up computational experiments (such as weather simulations) into easily verifiable bundles with easy access to the software code, the input parameters and the results.
- Research data as a Unique and Distinctive Collection (UDC)
- Looking at how data can fit into research library collection policies for long-term preservation of and access to key institutional assets.
- Managing sensitive data
- Looking at how the rules that apply to sensitive data (both general, e.g. Data Protection Act, and specific, e.g. consent forms) can be codified and applied consistently to facilitate inter-institutional collaborations.
- Linked data notebook
- Development of a work-in-progress notebook that allows individuals and small research groups to capture and create sharable linked data.
- Research Data requirements vocabulary
- Codifying the requirements for data management (how long must it be held for, how large will it get, etc.) so that managing it can be at least partly automated.
- Standards and Schemas for Digital Research Notebooks
- Improving the interoperability of electronic lab notebooks/digital research notebooks.
Just in time before the College closes for the Christmas break I have found the time to write my overdue summary of recent developments in the world of open access and scholarly communication. Below are a few of the headlines and developments that caught my eye during the last couple of months.
Cost of Open Access
Commissioned by London Higher and SPARC Europe, Research Consulting have published Counting the Costs of Open Access. Using data provided by universities, including Imperial College, it concludes that there was a £9.2m cost to UK research organisations for achieving compliance with RCUK’s open access policy in 2013/14. Main conclusions are quoted below – the estimated costs for meeting REF open access requirements are particularly interesting seeing as HEFCE do not provide any funding for their in some ways even more ambitious open access policy:
- The time devoted to OA compliance is equivalent to 110 fulltime staff members across the UK.
- The cost of meeting the deposit requirements for a post-2014 REF is estimated at £4-5m per annum.
- Gold OA takes 2 hours per article, at a cost of £81.
- Green OA takes just over 45 minutes, at a cost of £33.
Pinfield, Salter and Bath published: The ‘total cost of publication’ in a hybrid open-access environment. The study analyses data from 23 UK institutions, including Imperial College, covering the period 2007 to 2014. It finds that while the mean value of APCs has been relatively stable, ‘hybrid’ subscription/OA journals were consistently more expensive than fully-OA journals. Modelling shows that APCs are now constituting 10% of the total cost of ownership for publishing (excluding administrative costs).
EBSCO’s 2015 Serials Price Projection Report assumes price increase of 5-7%, not including a recommended additional 2-4% to allow for currency fluctuations.
John Ulmschneider, Librarian at the Virginia Commonwealth University, estimates that with current price increases the cost for subscription payments would “eat up the entire budget for this entire university in 20 years”. Partly in response to that, VCU has launched its own open access publishing platform.
UK Funder News
Arthritis Research UK, Breast Cancer Campaign, the British Heart Foundation (BHF), Cancer Research UK, Leukaemia & Lymphoma Research, and the Wellcome Trust have joined together to create the Charity Open Access Fund (COAF). COAF operates in essentially the same way as the WT fund it replaces.
An article summarising responses to the RCUK review of open access cites the Wellcome Trust saying that sanctions could accelerate the implementation of open-access.
The Wellcome Trust published a list of journals that do not provide a compliant publishing option.
International Funder News
A new Danish open access strategy sets the goal to reach Open Access to 80% of all publically funded peer-reviewed articles in 2017, concluding with 100% in 2022.
The Open Access policy of the Austrian FWF requires CC BY (if Gold OA) and deposit in a sustainable repository on publication. The FWF covers APCs up to a limit of €2500.
Research Information published a summary of international developments around open access: The Research Council of Norway is making funding available to cover up to 50% of OA publishing charges. The Chinese Academy of Sciences and the National Natural Science Foundation of China require deposit of papers in an OA repository within 12 months of publication. The Mexican president has signed an act to provide “Mexicans with free access to scientific and academic production, which has been partially or fully financed by public funds”.
Publishers and Open Access
In November, negotiations between Elsevier and the Dutch universities broke down following an Elsevier proposal that “totally fails to address this inevitable change [to open access]”. The universities have since reached an agreement with Springer; negotiations with Elsevier have resumed.
The launch of Science Advances, a journal of the American Association for the Advancement of Science (AAAS), prompted strong criticism of the AAAS approach to open access. Over a hundred scientists signed an open letter criticising AAAS for charging $1000 for the CC BY license as well as $1500 for papers longer than ten pages – on top of a $3000 base APC. This has been picked up by media including the New Statesman.
The Nature Publishing Group has had two major OA-related headlines. Generally well received was the announcement that NPG would switch the prestigious Nature Communications to full open access. On the other hand, the move to give, limited, read access to articles has been widely criticised as beggar access and a step back for open access: NPG allow those with a subscription to give others viewing (not printing) access to papers, through a proprietary software.
An open letter signed by nearly 60 open access advocates, publishers, library organisations and civil society bodies warns against model licenses governing copyright on open access articles proposed by the International Association of Scientific, Technical & Medical Publishers (STM). The letter says the STM licences “would limit the use, reuse and exploitation of research” and would “make it difficult, confusing or impossible to combine these research outputs with other public resources”. The STM licenses are seen as incompatible with Creative Commons licences.
Jisc and Wiley have negotiated a deal that provides credits for article processing charges (APCs) to universities that license Wiley journal content and have a Wiley OA account.
This year’s UKSG one day conference focused on how researchers are being supported in the changing scholarly communications landscape. The day brought together academics, librarians, publishers and funders to discuss how we can work together to achieve open access requirements as painlessly as possible. What follows is a summary of the event, and the whole day was filmed so you can catch up on the talks at the UKSG website.
The day began with Ben Johnson from HEFCE who told the story of how open access came to the attention of the UK government when David Willetts was unable to access the journal articles required to write his book. From Willetts to the Finch Report to the new REF policy, universities are now being pushed into action to ensure publications are made open access and impact of research is demonstrated. HEFCE and other UK funders are making it clear that if research is to have an impact on policy people within government need access to it.
Simon Hubbard from the University of Manchester spoke next about the complicated process of making a paper open access, reporting on research to your funder and storing your research data in the appropriate place. Even for a researcher who has an active interest in open access publishing, the burden of bureaucracy can be off-putting, especially when it feels like he’s entering the same information over and over again into different systems. Finally, Simon had a few recommendations to improve the open access workflow: remove academics from the process as they only slow things down; better and more unified systems; and a simpler message from funders and publishers.
A final highlight of the morning came from Ian Carter at the University of Sussex, who spoke from the perspective of university management and strategic planning. Ian started by summarising the pressures that researchers find themselves under, from conducting “world-class” research, to providing value for money to students paying much higher fees than ever before, to compliance with varying funder policies. To achieve all of this there must be behavioural change from researchers, for example making their work more accessible through open access, and additional support from institutions to ensure these goals align with their overall strategy. Dissemination, communication and impact were identified as some of the most important aims for both researchers and institutions.
The second half of the day saw the librarian’s perspective from Martin Wolf at the University of Liverpool; he believes librarians have a better understanding of the overall picture and how different stakeholders interact. Librarians often find themselves interpreting both funders’ policies and publishers’ open access options to researchers. However, in addition to this advocacy work, librarians seem to be getting increasingly stuck on the detail and are too risk averse when it comes to promoting open access, for example, over the minutiae of a publisher’s copyright policy. Comments from publishers after this session implied that early career researchers are asking very basic questions about open access, so there is still a lot of work to be done.
The last few sessions were lightning talks from providers of altmetrics tools; Digital Science, Kudos and Plum Analytics. These are just three of the many new products designed to capitalise on the impact agenda, and aim to help researchers increase and measure the impact of their publications.
Overall, the day was very useful and demonstrated the various perspectives on research and publication, including changing expectations from all stakeholders involved in the process. It’s clear that while the post-REF2014 policy has been a disruptive force, change was already beginning in the areas of open access, alternative metrics and demonstrating the impact of research.
Last night saw the launch of the Open Access Button to coincide with worldwide Open Access week. The team behind the Open Access Button aim to help researchers, students and the general public access research papers that are behind paywalls and beyond their means.
The idea came from two medical students who were frustrated at not being able to access all the research they wanted to read, and finding the average cost to read a paywalled article was $30. Although the team has expanded to include partnerships with Cottage Labs, Jisc and more, there are still a large number of students donating their time to the project. Work began on the Button last year with a beta project that saw 5000 people hit almost 10,000 paywalls or denied access.
The new version of the Open Access Button is a plug-in for your browser that works as a button you click any time you cannot access an article due to a paywall. The system registers information about the article and your location to create a map of researchers who need access to information.
The Open Access Button will try to find a free to access version of the article, for example a pre-print deposited to an institutional or subject repository. If an alternative version cannot be found, the Button will then email the author to let them know that someone wants to access their research but can’t, and suggests the author deposits a copy in a repository.
Upon clicking the button, users are asked to enter a few sentences about why they want to read the article and what they could do if the research was available open access. The creators hope to use this information for open access advocacy, and to create stories that connect researchers, their work and readers around the world.
Keep up to date with the project on Twitter @OA_Button