Direct and Indirect Annotation with Generative AI: A Case Study into Finding Animals and Plants in Historical Text

Arjan van Dalfsen*, Folgert Karsdorp, Ayoub Bagheri, Els Stronks, Dieuwertje Mentink, Thirza van Engelen

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

This study explores the use of generative AI (GenAI) for annotation in the humanities, comparing direct and indirect annotation approaches with human annotations. Direct annotation involves using GenAI to annotate the entire corpus, while indirect annotation uses GenAI to create training data for a specialized model. The research investigates zero-shot and few-shot methods for direct annotation, alongside an indirect approach incorporating active learning, few-shotting, and k-NN example retrieval. The task focuses on identifying words (also referred to as entities) related to plants and animals in Early Modern Dutch texts. Results show that indirect annotation outperforms zero-shot direct annotation in mimicking human annotations. However, with just a few examples, direct annotation catches up, achieving similar performance to indirect annotation. Analysis of confusion matrices reveals that GenAI annotators make similar types of mistakes, such as confusing parts and products or failing to identify entities, which are broader than those made by humans. Manual error analysis indicates that each annotation method (human, direct, and indirect) has some unique errors. Given the limited scale of this study, it is worthwhile to further explore the relative affordances of direct and indirect GenAI annotation methods.
Original languageEnglish
Title of host publicationComputational Humanities Research 2024
Subtitle of host publicationProceedings of the Computational Humanities Research Conference 2024 Aarhus, Denmark, December 4-6, 2024.
PublisherCEUR WS
Pages1053-1074
Publication statusPublished - 18 Nov 2024

Fingerprint

Dive into the research topics of 'Direct and Indirect Annotation with Generative AI: A Case Study into Finding Animals and Plants in Historical Text'. Together they form a unique fingerprint.

Cite this