A significant challenge for the very near future is going to be sheer data volume, with the SKA, crazily, predicted to be producing exobytes of data per second. That's terrifying. So there will be cases where the data access is going to have to be limited, but that doesn't mean we can't minimise this. And already simulations tend to produce vast amounts of data, which for a small institute it's simply wildly impractical to expect them to be able to host it publically (I've deleted terabytes of my own data because it's just too friggin' large). But I'm reminded of a nice talk I saw last week where the speaker described a procedure to easily include the full details of the software used, including the exact versions of every library and dependency. A move towards more of this kind of approach, where a paper describes - as we're taught in high school to do - exactly how to reproduce a result, would be a very good thing. Alternatively, there could journals specialising in contemporary methods of analysis, so that one could simply cite a paper describing the method and then just add very brief notes about any modifications used. The point is that if you can't provide the actual data, at least provide the tools to exactly reproduce it.
We present a bibliographic analysis of Chandra, Hubble, and Spitzer publications. We find (a) archival data are used in >60% of the publication output and (b) archives for these missions enable a much broader set of institutions and countries to scientifically use data from these missions. Specifically, we find that authors from institutions that have published few papers from a given mission publish 2/3 archival publications, while those with many publications typically have 1/3 archival publications. We also show that countries with lower GDP per capita overwhelmingly produce archival publications, while countries with higher GDP per capital produce guest observer and archival publications in equal amounts. We argue that robust archives are thus not only critical for the scientific productivity of mission data, but also the scientific accessibility of mission data. We argue that the astronomical community should support archives to maximize the overall scientific societal impact of astronomy, and represent an excellent investment in astronomy's future.
The scientific accessibility of astronomical data is critical to maintain a rich, flourishing, and growing discourse in astronomy. If the astronomy conversation is dominated by only a few voices, institutions, or countries, the entire scientific process, where old ideas are constantly challenged and new ideas are constantly proposed, can wither and die. Further, by expanding the community working on these missions and in astronomy we sow the seeds for the future success of the discipline. We note that engagement of the lay community through public outreach and citizen science is also critical to the success of astronomy and is similarly enhanced by access to archival data, but in this work we explicitly address scientific engagement with astronomical data.
Robust Archives Maximize Scientific Accessibility
We present a bibliographic analysis of Chandra, Hubble, and Spitzer publications. We find (a) archival data are used in >60% of the publication output and (b) archives for these missions enable a much broader set of institutions and countries to scientifically use data from these missions.
No comments:
Post a Comment