Tailored Graph Embeddings for Entity Alignment on Historical Data

Research output: Contribution to conferencePaperAcademic

Abstract

In the domain of the Dutch cultural heritage various data sets describe different aspects of life during the Dutch Golden Age.These data sets, in the form of RDF graphs, use different standards and contain noise in the values of literal nodes, such as misspelled names and uncertainty in dates. The Golden Agents project aims at answering queries about the Dutch Golden ages using these distributed and independently maintained data sets. A problem in this project, among many other problems, is the identification of persons who occur in multiple data sets but under different URI’s. This paper aims to solve this specific problem and generate a linkset, i.e. a set of pairs of URI’s which are judged to represent the same person. We use domain knowledge in the application of an existing node context generation algorithm to serve as input for GloVe, an algorithm originally designed for embedding words. This embedding is then used to train a classifier on pairs of URI’s which are known duplicates and non-duplicates. Using just the cosine similarity between URI-pairs in embedding space for prediction,we obtain a simple classifier with an F12-score of around 0.85, even when very few training examples are provided. On larger training sets, more complex classifiers are shown to reach an F12-score ofup to 0.88
Original languageEnglish
Pages125--133
Number of pages9
DOIs
Publication statusPublished - 30 Nov 2020
EventInternational Conference on Information Integration and Web-based Applications & Services - Online due to Covid 19
Duration: 30 Nov 20202 Dec 2020
Conference number: 22
http://www.iiwas.org/conferences/iiwas2020/

Conference

ConferenceInternational Conference on Information Integration and Web-based Applications & Services
Abbreviated titleiiWAS 2020
Period30/11/202/12/20
Internet address

Keywords

  • RDF
  • Cultural Heritage
  • Entity Alignment
  • Embedding

Fingerprint

Dive into the research topics of 'Tailored Graph Embeddings for Entity Alignment on Historical Data'. Together they form a unique fingerprint.

Cite this