Searching and Finding Strikes in the New York Times

Iris Hendrickx, Marten Düring, Kalliopi Zervanou, Antal van den Bosch

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

The huge digitization step that archives, publishers, and libraries are currently undertaking enables access to a vast amount of information for historians. Yet, this does not necessarily make life easier for historians, as the
main problem remains how to find relevant sources in this sea of information.
We present a case study demonstrating how automatic text analysis can aid
historians in finding relevant primary sources. We focus on strike events in
the 1980s in the USA. In earlier work on strikes, researchers did not have at
their disposal a full and comprehensive list of major strike events. Existing
databases of this kind (e.g [19, 22]) are the result of intensive manual work
and took years to build. Natural language processing (NLP) tools allow for
faster assembly of datasets of this kind on the basis of collections of free
texts that contain the information that should be in the database. We aim to
construct a database of events using a digital newspaper archive and unsupervised NLP methods such as Latent Dirichlet Allocation (LDA) and clustering
techniques to group together newspaper articles that describe the same strike.
We study the effect of different feature representations, such as simple bagof-words features, named entities, and time stamp information. We evaluate
our results on a manually labeled sample of news articles describing a small
set of strikes.
Original languageEnglish
Title of host publicationProceedings of the 3rd Workshop on Annotation of Corpora for Research in the Humanities (ACRH-3)
Pages25-36
Publication statusPublished - 2013

Fingerprint

Dive into the research topics of 'Searching and Finding Strikes in the New York Times'. Together they form a unique fingerprint.

Cite this