Abstract
Accurate identification of fossils is instrumental to palaeontological research but requires expert knowledge, is time consuming, and is subject to human biases. Through citizen science platforms and apps, AI-assisted identifications can mitigate those challenges, as shown e.g., in successful biodiversity research applications. Fossil data is relatively scarce, but large and growing fossil datasets are made available in open data repositories through collection digitisation efforts. Furthermore, data is collected and validated by fossil enthusiasts on citizen science platforms. These datasets can be used as training data for deep learning classification models to provide both experts and citizen scientists with accurate, quick and easy to use tools to collect, validate and analyse palaeontological data. However, AI-model performance may be limited by the size and quality of the training dataset. We present and compare a set of convolutional neural networks (CNNs) that are trained and tested on standardised images from museum and private collections (>46,000 images) and images from the online citizen science platform Oervondstchecker.nl (>74,000 images). Both datasets consist of Quaternary vertebrate fossils and artefacts from the Netherlands and the southern North Sea Basin. Moreover, we compare model performance with identifications by 10 domain experts and active citizen scientists to gain a measure of data quality. The CNNs perform best when trained on standardised images (~85% top-1 accuracy) compared to citizen science data (~65% top-1 accuracy). Identifications by fossil experts show variable agreement among these experts.
Based on these insights we make recommendations on how to account for variable validator input to optimise AI model training and performance. The synergy between AI model predictions and domain expert identifications can rapidly increase the amount of high-quality identifications of fossils and flag potential rare finds. Finally, to further increase data acquisition and public engagement, the models with the best overall performance have been made publicly available online (https://museum.identify.biodiversityanalysis.nl/model/beach_fossils_species)
for use by professional experts, citizen scientists and the general public alike.
Based on these insights we make recommendations on how to account for variable validator input to optimise AI model training and performance. The synergy between AI model predictions and domain expert identifications can rapidly increase the amount of high-quality identifications of fossils and flag potential rare finds. Finally, to further increase data acquisition and public engagement, the models with the best overall performance have been made publicly available online (https://museum.identify.biodiversityanalysis.nl/model/beach_fossils_species)
for use by professional experts, citizen scientists and the general public alike.
| Original language | English |
|---|---|
| Pages | 126-127 |
| Number of pages | 2 |
| Publication status | Published - 2025 |
| Event | XXII Annual Meeting of the European Association of Vertebrate Palaeontologists - Kraków, Poland Duration: 30 Jun 2025 → 5 Jul 2025 Conference number: 22 https://eavp2025.wixsite.com/eavp2025 |
Conference
| Conference | XXII Annual Meeting of the European Association of Vertebrate Palaeontologists |
|---|---|
| Abbreviated title | EAVP 2025 |
| Country/Territory | Poland |
| City | Kraków |
| Period | 30/06/25 → 5/07/25 |
| Internet address |
Funding
Funding provided by NWO (Dutch Research Council) through an "Open Competition ENW-M" grant (dossier number: OCENW.M20.360)
Fingerprint
Dive into the research topics of 'AI and citizen science data help accelerate high quality fossil identifications by experts'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver