Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic Keyword Extraction

Research output: Contribution to journalConference articleAcademicpeer-review

Abstract

As textbooks evolve into digital platforms, they open a world of opportunities for Artificial Intelligence in Education (AIED) research. This paper delves into the novel use of textbooks as a source of high-quality labeled data for automatic keyword extraction, demonstrating an affordable and efficient alternative to traditional methods. By utilizing the wealth of structured information provided in textbooks, we propose a methodology for annotating corpora across diverse domains, circumventing the costly and time-consuming process of manual data annotation. Our research presents a deep learning model based on Bidirectional Encoder Representations from Transformers (BERT) fine-tuned on this newly labeled dataset. This model is applied to keyword extraction tasks, with the model’s performance surpassing established baselines. We further analyze the transformation of BERT’s embedding space before and after the fine-tuning phase, illuminating how the model adapts to specific domain goals. Our findings substantiate textbooks as a resource-rich, untapped well of high-quality labeled data, underpinning their significant role in the AIED research landscape.

Original languageEnglish
Pages (from-to)66-77
Number of pages12
JournalCEUR Workshop Proceedings
Volume3444
Publication statusPublished - Jul 2023
Event5th International Workshop on Intelligent Textbooks, iTextbooks 2023 - Tokyo, Japan
Duration: 3 Jul 2023 → …

Bibliographical note

Publisher Copyright:
© 2023 Copyright for this paper by its authors.

Keywords

  • automatic keyword extraction
  • BERT fine-tuning
  • labeled data
  • textbooks

Fingerprint

Dive into the research topics of 'Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic Keyword Extraction'. Together they form a unique fingerprint.

Cite this