A Quantitative Comparison of Semantic Web Page Segmentation Approaches

Robert Kreuzer, J. Hage, A.J. Feelders

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    Abstract

    We compare three known semantic web page segmentation
    algorithms, each serving as an example of a particular approach to the
    problem, and one self-developed algorithm, WebTerrain, that combines
    two of the approaches. We compare the performance of the four algorithms
    for a large benchmark of modern websites we have constructed,
    examining each algorithm for a total of eight configurations. We found
    that all algorithms performed better on random pages on average than
    on popular pages, and results are better when running the algorithms
    on the HTML obtained from the DOM rather than on the plain HTML.
    Overall there is much room for improvement as we find the best average
    F-score to be 0.49, indicating that for modern websites currently
    available algorithms are not yet of practical use.
    Original languageEnglish
    Title of host publicationProceedings of ICWE 2015
    PublisherSpringer
    Pages374-391
    Volume9114
    DOIs
    Publication statusPublished - 2015

    Publication series

    NameLNCS
    PublisherSpringer

    Fingerprint

    Dive into the research topics of 'A Quantitative Comparison of Semantic Web Page Segmentation Approaches'. Together they form a unique fingerprint.

    Cite this