Citation of Preprints

EarthArXiv allows an author to upload updated versions of their preprint. For instance, an author may submit an initial preprint, receive feedback, and choose to upload a revised version of that preprint. The way EarthArXiv is implemented, all versions of a preprint remain on the server and are assigned the same DOI. This is design choice by the Center for Open Science and not something that can be easily changed. One implication of this is that citing a preprint is not straightforward. The DOI is not enough to uniquely identify a preprint as other versions with the same DOI may exist in the future. A simple workaround would be to recommend a citation style that includes the DOI and a "last accessed" value similar to how we cite web pages. We may also want to recommend "preprint" be included so that it is obvious from a citation that the work has not been peer reviewed. I think it would be good if we got out ahead of this issue and starting thinking collectively how preprints should be cited


A similar issue comes about with data. EarthArXiv allows a manuscript to be connected to data and software used in the research. We may also want to think about data and software citations as well. The are scenarios in which the data and software linked to a preprint could change in the time between preprint submission and a user reading the paper. It would be good to think about best practices and recommendations for data and software citation to ensure reproducibility. There are already several groups, such as ESIP and RDA, who are thinking along these lines and have work we could build off of.


Yes. This is a major issue, which has come up several times in discussions I've had. I think we really need very, very clear guidance from OSF on what they think best practise is, as well as seeking guidance from other preprint services, maybe on the next monthly call.


In arXiv there is an identifier linked to the manuscript. E.g. arxiv gen-phys: 1704.00005v1. When citing the paper one uses this identifier. In eartharxiv, apparently a doi is used. So, a version v1 or v2 ..... can be added. Like:

"In EarthArxiv....v1 Dansgaard claims .... however in EarthArxiv....v2 this is changed into ..... According to Oeshger EarthArxiv....v2 this does not change much. However according to ...."

This could be the way to cite versions and have a discussion about changes and what they mran in the scientific debate.


Han's suggestion of using the arXiv system (a 'v1' 'v2' appended to the ID#) is also how PeerJPreprints and Figshare both handle 'versioning' of the same preprint/dataset (v1', or 'v2' appended to a DOI)... I think an issue is that some users may (inadvertently) leave the verison number off of the ID#/DOI — and it will still resolve (sending a user to the most recent version). So people who cite preprints should make sure to include the version # (and not to leave it off, even though it may work).


In arXiv, when one writes a document with latex (think e.g. authorea) the identifier is visible in the document itself.


Thank you @ebgoldstein for pin pointing the version as well. In INArxiv, we also recommend that citation technique to our users.


Thanks for the feedback everyone. This is very helpful. There are some upcoming meetings where we'll have an opportunity to chat with other preprint systems and the citation topic is on the agenda. It would be great if the various groups could recommend the same, or very similar, citation styles.


A related question. What are your thoughts on data and software citations? There are a number of groups working on establishing standard means for citing each. Should EarthArXiv have policies and recommendations on how to cite data and software? If not recommend, should we at least make such resources available and try to educate our community?


Yes, it would be nice. Moreover, we can try to get support from other portals such as pangea to render this facility, rather than hosting the content ourselves. There are multiple advantages of associating with servers like pangea. For example, since it is established, the readers are aware of the authenticity of the content.


Yes, it would be nice to recommend standards of data and software citations but I don't have any expertise in that matter.


Same reply as Stephanie. It's a good idea to get ahead of things if, in future, submissions of this type are likely to increase.


We could loosely base them on Pangaea recommendations? https://wiki.pangaea.de/wiki/Citation


+1 @sabinelengger Also, FORCE11 have recommendations for software/code---and there was a group in ESIP looking into this as well (Soren was leading, if memory serves), but I haven't followed up with that effort in some time.