In the current ecosystem of scholarly communication, more and more research results are disseminated via the web. As the temporary web addresses at which these resources are published can easily become unstable, various persistent identifier systems have been developed to ensure that publications, data sets and other forms of output can be referenced consistently, sustainably and reliably. PIDapalooza is an international conference dedicated fully to the history, the present and the future of such open persistent identifier systems. Following the successful first edition in 2016 in Reykjavik, the second PIDapalooza conference took place on 23 and 24 January 2018 in Girona, Spain. The conference was organised by some of the major institutions in this field, including CrossRef, ORCID, DataCite and the California Digital Library, and brought together a disparate crowd of more than 150 researchers, archivists, librarians and software developers.
Many of the speakers on PIDapalooza’s line up were engaged in projects that aimed to expand the range of objects or concepts that can be identified via PIDs. While previous efforts have concentrated mostly on the identification of publications, data sets or researchers, many organisations and working groups are currently in the process of developing dependable PID infrastructures for other types of entities, including scientific equipment, grant proposals, archaeological findings or software applications. Ed Pentz (CrossRef) and Alice Meadows (ORCID), for instance, spoke about an interesting new initiative to assign DOIs to peer review activities. Academic journals increasingly publish the results that emerge from peer review rounds, such as the reports, the decision letters, and the author responses. By assigning DOIs, these resources also become citable. Interestingly, organisations such as Publons, PeerJ, and BioMedCentral have developed a workflow in which the open resources that result from peer review activities can be added automatically to the ORCID profile of the researchers who were responsible for these documents. This system for assigning DOIs to peer review activities became fully operational in November 2017.
Representatives from ORCID, DataCite and CrossRef gave a presentation about the ongoing work on a new identifier system for organisations. This new system is provisionally called OrgID. As became clear from the talk, developing accurate identifiers for organisations is an extremely complicated task. Organisations may change their names, they may merge with other organisations, and they may consist of many smaller organisational units. The OrgID working group has published a document with product principles in October 2017, together with a document stating their ideas about the eventual governance system of OrgID.
David Shotton (Oxford University) elaborated on the main results of the Initiative for Open Citations, a collaboration between scholarly publishers, researchers, and other interested parties. Aided by CrossRef, the Initiative for Open Citations has managed to publish more than 13 million citation links in the form of linked open data via its new Open Citation Corpus. The date, the author, the type (e.g. ‘self-citation’) of these citations have been described using the Citation Typing Ontology, and the citations themselves have been assigned identifiers as well. These citation identifiers essentially describe a relation between existing objects: a connection between the article that cites and the article that is cited.
Outreach and promotion
Next to these discussions of initiatives to develop new identifier systems, a large number of the talks at PIDapalooza 2018 focused on outreach activities and on efforts to promote the actual adoption of persistent identifiers among researchers. Various speakers have emphasised that persistent identifiers can help to make research data FAIR, ensuring that researchers can be credited for their data sets. Much work appears to be needed in this area, however. Angeline Kraft (Technische Informationsbibliothek) discussed the results of a survey conducted among 1400 researchers in Germany in 2016 which revealed that less than 10% of all respondents were aware of the fact that DOIs can also be used to identify research data. Mark Hahnel, founder and CEO of figShare, stresses that data repositories should make it easier for researchers to use PIDs when they cite data sets. FigShare’s interface currently includes a “Cite” button, which leads to a page which clearly displays the DOI of the submitted data set, together with the metadata for these data in a citation style which can be altered flexibly.
My own talk at PIDapalooza 2018 focused in a similar vein on our efforts to promote the use of ORCID at Leiden. In 2017, a university-wide project was conducted to stimulate all researchers affiliated with Leiden University to create an ORCID, and to associate this identifier with as many of their results as possible. In January 2018, about 30% of all researchers based at Leiden had claimed an ORCID. An important characteristic of ORCID is that researchers are fully responsible for their own profile in the ORCID registry. Researchers can specify the results that are linked to their ORCID themselves, and they remain in full control of all the privacy settings. For this reason, the ORCID project mainly consisted of activities in the field of communication and outreach, aimed at raising awareness of the various advantages associated with ORCID.
This notion that outreach activities are of crucial importance formed a common thread in many of the talks at PIDapalooza 2018. As vital building blocks in the international infrastructure for open science, PIDs can improve the visibility and the impact of academic resources. They can only reach their full potential, however, when they are actively used by researchers and by academic institutions.
Why do PIDs matter for you?
As was emphasised in many of the talks, PIDs can make an enormous difference in the visibility and the impact of publications. Additionally, they can strongly improve the findability and the accessibility of data sets. As such, they can make it easier for researchers to comply with the FAIR data management principles, which are increasingly supported by funders and by publishers. When Leiden researchers publish their research data in a repository that assigns PIDs, as is also recommended by Leiden University’s research data management policy, these data can be cited much more effectively. The website of the LCRDM, incidentally, offers a good overview of the data repositories that mint PIDS for all the data sets they receive. All the efforts of those who are active in the PID community ultimately aim to ensure that researchers can properly be credited for their results, not only for their publications, but also for their peer review activities, their data sets and their conference talks.
Fittingly, most of the sides that were shown during the conference have now been assigned persistent identifiers themselves via PIDapalooza’s repository at figShare: https://pidapalooza.figshare.com/