Skip to main navigation Skip to search Skip to main content

Order out of Chaos: Construction of Knowledge Models from PDF Textbooks

    Research output: Contribution to conferencePaperAcademic

    Abstract

    Textbooks are educational documents created, structured and formatted by domain experts with the main purpose to explain the knowledge in the domain to a novice. Authors use their understanding of the domain when structuring and formatting the content of a textbook to facilitate this explanation. As a result, the formatting and structural elements of textbooks carry the elements of domain knowledge implicitly encoded by their authors. Our paper presents an extendable approach towards automated extraction of this knowledge from textbooks taking into account their formatting rules and internal structure. We focus on PDF as the most common textbook representation format; however, the overall method is applicable to other formats as well. The evaluation experiments examine the accuracy of the approach, as well as the pragmatic quality of the obtained knowledge models using one of their possible applications - semantic linking of textbooks in the same domain. The results indicate high accuracy of model construction on symbolic, syntactic and structural levels across textbooks and domains, and demonstrate the added value of the extracted models on the semantic level.

    Original languageEnglish
    Pages1-10
    Number of pages10
    DOIs
    Publication statusPublished - 29 Sept 2020
    Event20th ACM Symposium on Document Engineering, DocEng 2020 - Virtual, Online, United States
    Duration: 29 Sept 20201 Oct 2020

    Conference

    Conference20th ACM Symposium on Document Engineering, DocEng 2020
    Country/TerritoryUnited States
    CityVirtual, Online
    Period29/09/201/10/20

    Funding

    This work was partially supported by the Ministry of Science, Technology and Telecommunication of Costa Rica (grant 2-1-4-17-1-021) and the INTERREG-IVA-GR program (grant 138 GR DeLux 32274).

    Keywords

    • knowledge modeling
    • model extraction
    • PDF processing
    • textbook

    Fingerprint

    Dive into the research topics of 'Order out of Chaos: Construction of Knowledge Models from PDF Textbooks'. Together they form a unique fingerprint.

    Cite this