Increasing the equitability of data citation in paleontology: capacity building for the big data future

Jansen A. Smith, Nussaibah B. Raja, Thomas Clements, Danijela Dimitrijevic, Elizabeth M. Dowding, Emma M. Dunne, Bryan M. Gee, Pedro L. Godoy, Elizabeth M. Lombardi, Laura P. A. Mulvey, Paulina S. Naetscher, Carl J. Reddin, Bryan Shirley, Rachel C. M. Warnock, Adam T. Kocsis

Research output: Contribution to journalReview articlepeer-review

Abstract

Data compilations expand the scope of research; however, data citation practice lags behind advances in data use. It remains uncommon for data users to credit data producers in professionally meaningful ways. In paleontology, databases like the Paleobiology Database (PBDB) enable assessment of patterns and processes spanning millions of years, up to global scale. The status quo for data citation creates an imbalance wherein publications drawing data from the PBDB receive significantly more citations (median: 4.3 +/- 3.5 citations/year) than the publications producing the data (1.4 +/- 1.3 citations/year). By accounting for data reuse where citations were neglected, the projected citation rate for data-provisioning publications approached parity (4.2 +/- 2.2 citations/year) and the impact factor of paleontological journals (n = 55) increased by an average of 13.4% (maximum increase = 57.8%) in 2019. Without rebalancing the distribution of scientific credit, emerging "big data" research in paleontology-and science in general-is at risk of undercutting itself through a systematic devaluation of the work that is foundational to the discipline.
Original languageEnglish
Pages (from-to)165-176
Number of pages12
JournalPaleobiology
Volume50
Issue number2
Early online dateDec 2023
DOIs
Publication statusPublished - May 2024

Bibliographical note

Publisher Copyright:
© 2023 The Author(s). Published by Cambridge University Press on behalf of Paleontological Society.

Funding

Improved data sharing requires buy-in from individuals, who may themselves benefit from the practice and enhance the quality of science in paleontology. As reviewed by Marwick and Birch (), there are many reasons to share data (e.g., reciprocal data sharing by others; reproducibility of research; enabling others to ask new questions) and some associated costs (e.g., time required to clean data; data use without citation). One of the incentives is that data sharing is associated with increased citation of the publication where the data were initially published (Sears ; Piwowar and Vision ; Tomaszewski ; Colavizza et al. ; Dorta-González et al. ). For example, Colavizza et al. () reported that when publications included data availability statements with the associated data publicly accessible, those publications saw a 25% increase in their citations compared with publications without available data. As demonstrated here (), the potential citation benefit may be even larger in a discipline like paleontology, where publications on data compilations have become mainstream. Changes to the format on funding proposals, for example, inclusion of a “research outcomes” section that includes datasets by the Deutsche Forschungsgemeinschaft (i.e., German Research Foundation) and a non-publication section in National Science Foundation grant reports, can further encourage data sharing. Of perhaps greater importance, data sharing ensures the reproducibility of scientific results (Piwowar and Vision ; Altman et al. ; Marwick and Birch ). As has been demonstrated to the detriment of many fields of study (e.g., behavioral ecology [Viglione ], food science [van der Zee et al. ], paleontology [Price ], psychology [John et al. ]), some researchers have been guilty of misrepresenting their data. Data sharing provides a means to uphold academic integrity and establishes an ethical and practical standard that encourages scientific advancement (Marwick and Birch ; Raja and Dunne ). We thank the many authors of the official PBDB papers who shared their raw data with us, and those responsible for maintaining the PBDB as the excellent community resource that it is. We also thank M. Patzkowsky, G. Jones, M. Hopkins (editor), and P. Monarrez and P. Novack-Gottshall (reviewers) for their comments that improved an earlier version of this article. This work was supported in part by the Paleosynthesis Project, with funding from the Volkswagen Stiftung, and by the TERSANE project, with funding from the Deutsche Forschungsgemeinschaft (FOR 2332; grant nos. KI 806/17–1 (N.B.R., D.D.), BA 5148/1-2 to K. De Baets (P.S.N.), AB 109/11-1 to M. Aberhan (C.J.R.), and Ko 5382/2-1 (Á.T.K.). P.L.G. was supported by the São Paulo Research Foundation (FAPESP 2022/05697-9). B.M.G. was supported by the National Science Foundation (ANT-1947094 to C. Sidor). B.S. was supported by the Deutsche Forschungsgemeinschaft (JA 2718/3-1) and the Netherlands Earth System Science Centre (NESSC).

FundersFunder number
National Science FoundationJA 2718/3-1, ANT-1947094
National Science Foundation
Deutsche ForschungsgemeinschaftFOR 2332, Ko 5382/2-1, KI 806/17–1, BA 5148/1-2, AB 109/11-1
Deutsche Forschungsgemeinschaft
Volkswagen Foundation
Fundação de Amparo à Pesquisa do Estado de São Paulo2022/05697-9
Fundação de Amparo à Pesquisa do Estado de São Paulo
Netherlands Earth System Science Centre

    Keywords

    • Biodiversity
    • Open science
    • Paleobiology Database
    • Specimen-based
    • Taxonomy

    Fingerprint

    Dive into the research topics of 'Increasing the equitability of data citation in paleontology: capacity building for the big data future'. Together they form a unique fingerprint.

    Cite this