Thu 14 Dec 2017

Data papers and Data Management

Tom Narock Public Seen by 399

A topic that's come up a lot at this year's AGU meeting is data management. Many U.S. funding agencies, in particular NSF, are taking additional steps to ensure open data and effective data management policies. The suggestions that follow are based off my conversations at AGU and are U.S. centric. However, my impression is that these issues are relevant internationally as well. There are two ways in which EarthArXiv may be beneficial in the emerging focus on data management. We could support a "data" preprint, which would be a paper detailing a new dataset, its availability and policies, and potentially an overview of the methodology used to generate it. As a result, EarthArXiv would offer a citable reference paper for emerging datasets. I know that some journals already support data papers. My impression is that they are under utilized and we may benefit by also moving data papers to preprints, which are immediately available and are flexible in formatting. Another possible scenario that emerged is for EarthArXiv to also act as a data repository reference. Initially, this could be as simple as a web-based table suggesting permanent open repositories for scientists who have data needing a permanent home. Long term, maybe this evolves into a more interactive feature and automated submission of data to the repository. At this point, I wanted to start the discussion and ask two things in particular.
1.) Are there any counter arguments and/or opposition to supporting "data" papers?
2.) The National Center for Atmospheric Research (NCAR, Colorado, USA) will be hosting a workshop on related topics August 7-9, 2018. I hope to attend. Would anyone else be interested in going as well? I'm in contact with the organizers and will be happy to add anyone interested to the email list.



Jeroen Bosman Sat 16 Dec 2017

Generally I think EarthArXiv should be accommodating all kinds of papers: theoretical, empirical, explorative, hypothesis-testing, data papers or other types still. Data papers are very valuable for researchers wanting to work with the data set described in the paper. That could include the authors of the data paper of course. AFAIK technically there is nothing different about a data paper (just did one myself at F1000) that would need consideration.


Tom Narock Mon 18 Dec 2017

Thanks, @jeroenbosman. I completely agree. There is nothing different about a data paper and I would love to see EarthArXiv accommodate all kinds of papers. I am just hesitant to actively advocate for such papers before running the idea by everyone in this group.


Han Geurdes Tue 19 Dec 2017

What exactly is a "data paper" ...


Jeroen Bosman Tue 19 Dec 2017


Allison Enright Tue 19 Dec 2017

I'm also in complete agreement. One other consideration; many people (myself included) have data sets that may not meet the requirements for a specific relevant archive. I've found myself in this position because when designing experiments and collecting data, I hadn't really planned out an archiving strategy, mostly due to my inexperience. In this case, EarthArXiv acting as a repository for this still-useful data and specifically allowing authors to include accompanying documentation might be incredibly useful.


Angelo Pio Rossi Tue 19 Dec 2017

How would it compare/relate to non peer-reviewed data archives (which I guess could cover the case above), such as Zenodo,Figshare and alike? a paper describing data can still be some sort of pdf, but data themselves could require much more effort/volume. Is that something EarthArxiv wants / can afford to do? Wouldn't it be easier to just use an existing data repository with a DOI (say, Zenodo) and have the accompanying documentation either there or on EarthArXiv? In the latter case, what could be the added value (2 doi, 2 separate places for data and description)?


Jeroen Bosman Tue 19 Dec 2017

For now I would suggest to stick what people probably expect of a preprint archive: early versions of papers, chapters, books. So mainly text publications. For all other stuff (data, posters, slides, video) there are more dedicated options (EarthChem, Zenodo, Dryad, Dataverse, Figshare, Pangaea, Vimeo and many more). At least for now I think it is wise to build some reputation as an archive for early paper versions, but not rule out discussion on other document/publication types in a later stage. It is especially interesting to follow what happens at ESSoar because their explicit intention is to host posters from the start. Posters are perhaps the one exception of which I would explore the hosting/sharing opportunities in this stage. I would be interesting to hear from OSF/COS if they have any intention to support poster sharing. I'm not sure what that would demand from their metadata scheme and preview options.


Sabine Lengger Wed 20 Dec 2017

Me too, I think it's a great idea.


Christopher Jackson Fri 22 Dec 2017

Hi All. On balance, I think we should be open to this. Data are critical, and making it accessible is becoming increasingly important. We already encourage people to upload data supporting the pre-/postprint, so why not straight-up data files? I guess it could cloud the water a bit, as hinted at by @jeroenbosman, and maybe it'll lead us to challenge people like Figshare (who do this type of stuff very well), but opening-up to this can't have any negatives, could it?


Domenico Chiarella Sat 23 Dec 2017


Jamie Farquharson Sat 23 Dec 2017

My gut instinct would be to promote data sharing through existing dedicated platforms such as Figshare or Vimeo, with any supporting documentation being hosted on EarthArXiv. Especially in the early days of EarthArXiv, I think it is very important to have a clear and communicable focus: what exactly do we host and why? This is a typical question I encounter about the project. I feel that hosting data papers that are neither pre- or post-print would muddy the waters somewhat.


brandon Thu 4 Jan 2018

As mentioned in another thread, there's always room for a synthesis paper, or an extended abstract further describing a "data" artefact. These can be full of links to the relevant resources on figshare, dryad, etc.

I agree that EarthArXiv is intended for a certain type of resource and the community should not encourage....ramming a square peg through a round hole. :)