Paraphrasing headlines by machine translation: Sentential paraphrase acquisition and generation using Google News

S. Wubben, A. Van Den Bosch, E. Krahmer

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

In this paper we investigate the automatic collection, generation and evaluation of sentential
paraphrases. Valuable sources of paraphrases are news article headlines; they tend to describe the same event in various different ways, and can easily be obtained from the web. We
describe a method for generating paraphrases by using a large aligned monolingual corpus
of news headlines acquired automatically from Google News and a standard Phrase-Based
Machine Translation (PBMT) framework. The output of this system is compared to a word
substitution baseline. Human judges prefer the PBMT paraphrasing system over the word
substitution system. We compare human judgements to automatic judgement measures and
demonstrate that the BLEU metric correlates well with human judgements provided that the
generated paraphrase is sufficiently different from the source sentence.
Original languageEnglish
Title of host publicationComputational Linguistics in the Netherlands 2010
Subtitle of host publicationSelected Papers from the Twentieth CLIN Meeting
PublisherLOT
Pages169-183
Publication statusPublished - 2011
Externally publishedYes

Fingerprint

Dive into the research topics of 'Paraphrasing headlines by machine translation: Sentential paraphrase acquisition and generation using Google News'. Together they form a unique fingerprint.

Cite this