TY - JOUR
T1 - On the use of distributed semantics of tweet metadata for user age prediction
AU - Pandya, Abhinay
AU - Oussalah, Mourad
AU - Monachesi, P.
AU - Kostakos, Panos
PY - 2019
Y1 - 2019
N2 - Social media data represent an important resource for behavioral analysis of the aging population. This paper addresses the problem of age prediction from Twitter dataset, where the prediction issue is viewed as a classification task. For this purpose, an innovative model based on Convolutional Neural Network is devised. To this end, we rely on language-related features and social media specific metadata. More specifically, we introduce two features that have not been previously considered in the literature: the content of URLs and hashtags appearing in tweets. We also employ distributed representations of words and phrases present in tweets, hashtags and URLs, pre-trained on appropriate corpora in order to exploit their semantic information in age prediction. We show that our CNN-based classifier, when compared with baseline models, yields an improvement of up to 12.3% for Dutch dataset, 9.8% for English1 dataset, and 6.6% for English2 dataset in the micro-averaged F1 score.
AB - Social media data represent an important resource for behavioral analysis of the aging population. This paper addresses the problem of age prediction from Twitter dataset, where the prediction issue is viewed as a classification task. For this purpose, an innovative model based on Convolutional Neural Network is devised. To this end, we rely on language-related features and social media specific metadata. More specifically, we introduce two features that have not been previously considered in the literature: the content of URLs and hashtags appearing in tweets. We also employ distributed representations of words and phrases present in tweets, hashtags and URLs, pre-trained on appropriate corpora in order to exploit their semantic information in age prediction. We show that our CNN-based classifier, when compared with baseline models, yields an improvement of up to 12.3% for Dutch dataset, 9.8% for English1 dataset, and 6.6% for English2 dataset in the micro-averaged F1 score.
KW - Social media mining
KW - Twitter
KW - Convolutional neural networks
KW - Age prediction
U2 - 10.1016/j.future.2019.08.018
DO - 10.1016/j.future.2019.08.018
M3 - Article
SN - 0167-739X
VL - 102
SP - 437
EP - 452
JO - Future Generation Computer Systems
JF - Future Generation Computer Systems
ER -