Abstract
We describe a system that estimates when an event is going to happen from a stream of microtexts on Twitter referring to that event. Using a Twitter archive and 60 known football events, we train machine learning classifiers to map unseen tweets onto discrete time segments. The time period before the event is automatically segmented; the accuracy with which tweets can be classified into these segments determines the error (RMSE) of the time-to-event prediction. In a cross-validation experiment we observe that support vector machines with χ2 feature selection attain the lowest prediction error of 52.3 hours off. In a comparison with human subjects, humans produce a larger error, but recognize more tweets as posted before the event; the machine-learning approach more often misclassifies a 'before' tweet as posted during or after the event.
Original language | English |
---|---|
Title of host publication | BNAIC 2013 |
Publication status | Published - 1 Jan 2013 |