Abstract
As textbooks evolve into digital platforms, they open a world of opportunities for Artificial Intelligence in Education (AIED) research. This paper delves into the novel use of textbooks as a source of high-quality labeled data for automatic keyword extraction, demonstrating an affordable and efficient alternative to traditional methods. By utilizing the wealth of structured information provided in textbooks, we propose a methodology for annotating corpora across diverse domains, circumventing the costly and time-consuming process of manual data annotation. Our research presents a deep learning model based on Bidirectional Encoder Representations from Transformers (BERT) fine-tuned on this newly labeled dataset. This model is applied to keyword extraction tasks, with the model’s performance surpassing established baselines. We further analyze the transformation of BERT’s embedding space before and after the fine-tuning phase, illuminating how the model adapts to specific domain goals. Our findings substantiate textbooks as a resource-rich, untapped well of high-quality labeled data, underpinning their significant role in the AIED research landscape.
Original language | English |
---|---|
Pages (from-to) | 66-77 |
Number of pages | 12 |
Journal | CEUR Workshop Proceedings |
Volume | 3444 |
Publication status | Published - Jul 2023 |
Event | 5th International Workshop on Intelligent Textbooks, iTextbooks 2023 - Tokyo, Japan Duration: 3 Jul 2023 → … |
Bibliographical note
Publisher Copyright:© 2023 Copyright for this paper by its authors.
Keywords
- automatic keyword extraction
- BERT fine-tuning
- labeled data
- textbooks