Sister blog of Physicists of the Caribbean. Shorter, more focused posts specialising in astronomy and data visualisation.

Tuesday 16 July 2019

Thou Shalt Make An Accessible Data Archive

There's very little I can add to this so I'm just going to quote it. We all know data archives are important, and here's the data to prove it. By all means give people a proprietary period on their data, but all major facilities should make every effort to make as much data as possible public as soon as possible. As a standard, I suggest making science-level data (as opposed to the raw, unprocessed data) public upon first publication.

A significant challenge for the very near future is going to be sheer data volume, with the SKA, crazily, predicted to be producing exobytes of data per second. That's terrifying. So there will be cases where the data access is going to have to be limited, but that doesn't mean we can't minimise this. And already simulations tend to produce vast amounts of data, which for a small institute it's simply wildly impractical to expect them to be able to host it publically (I've deleted terabytes of my own data because it's just too friggin' large). But I'm reminded of a nice talk I saw last week where the speaker described a procedure to easily include the full details of the software used, including the exact versions of every library and dependency. A move towards more of this kind of approach, where a paper describes - as we're taught in high school to do - exactly how to reproduce a result, would be a very good thing. Alternatively, there could journals specialising in contemporary methods of analysis, so that one could simply cite a paper describing the method and then just add very brief notes about any modifications used. The point is that if you can't provide the actual data, at least provide the tools to exactly reproduce it.
We present a bibliographic analysis of Chandra, Hubble, and Spitzer publications. We find (a) archival data are used in >60% of the publication output and (b) archives for these missions enable a much broader set of institutions and countries to scientifically use data from these missions. Specifically, we find that authors from institutions that have published few papers from a given mission publish 2/3 archival publications, while those with many publications typically have 1/3 archival publications. We also show that countries with lower GDP per capita overwhelmingly produce archival publications, while countries with higher GDP per capital produce guest observer and archival publications in equal amounts. We argue that robust archives are thus not only critical for the scientific productivity of mission data, but also the scientific accessibility of mission data. We argue that the astronomical community should support archives to maximize the overall scientific societal impact of astronomy, and represent an excellent investment in astronomy's future.
The scientific accessibility of astronomical data is critical to maintain a rich, flourishing, and growing discourse in astronomy. If the astronomy conversation is dominated by only a few voices, institutions, or countries, the entire scientific process, where old ideas are constantly challenged and new ideas are constantly proposed, can wither and die. Further, by expanding the community working on these missions and in astronomy we sow the seeds for the future success of the discipline. We note that engagement of the lay community through public outreach and citizen science is also critical to the success of astronomy and is similarly enhanced by access to archival data, but in this work we explicitly address scientific engagement with astronomical data. 

Robust Archives Maximize Scientific Accessibility

We present a bibliographic analysis of Chandra, Hubble, and Spitzer publications. We find (a) archival data are used in >60% of the publication output and (b) archives for these missions enable a much broader set of institutions and countries to scientifically use data from these missions.

No comments:

Post a Comment

Back from the grave ?

I'd thought that the controversy over NGC 1052-DF2 and DF4 was at least partly settled by now, but this paper would have you believe ot...