Finding Related Forum Posts through Content Similarity over Intention-Based Segmentation

Dimitra Papadimitriou, Georgia Koutrika, Yannis Velegrakis, John Mylopoulos

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

We study the problem of finding related forum posts to a post at hand. In contrast to traditional approaches for finding related documents that perform content comparisons across the content of the posts as a whole, we consider each post as a set of segments, each written with a different goal in mind. We advocate that the relatedness between two posts should be based on the similarity of their respective segments that are intended for the same goal, i.e., are conveying the same intention. This means that it is possible for the same terms to weigh differently in the relatedness score depending on the intention of the segment in which they are found. We have developed a segmentation method that by monitoring a number of text features can identify the parts of a post where significant jumps occur indicating a point where a segmentation should take place. The generated segments of all the posts are clustered to form intention clusters and then similarities across the posts are calculated through similarities across segments with the same intention. We experimentally illustrate the effectiveness and efficiency of our segmentation method and our overall approach of finding related forum posts.

Original languageEnglish
Article number7915736
Pages (from-to)1860-1873
Number of pages14
JournalIEEE Transactions on Knowledge and Data Engineering
Volume29
Issue number9
DOIs
Publication statusPublished - 1 Sept 2017

Keywords

  • clustering
  • communication means
  • diversity
  • forums
  • goals
  • intention
  • posts
  • ranking
  • relatedness
  • retrieval
  • similarity
  • Text comparison
  • text segmentation
  • user messages

Fingerprint

Dive into the research topics of 'Finding Related Forum Posts through Content Similarity over Intention-Based Segmentation'. Together they form a unique fingerprint.

Cite this