TY - GEN
T1 - Top-κ Item identification on dynamic and distributed datasets
AU - Guerrieri, Alessio
AU - Montresor, Alberto
AU - Velegrakis, Yannis
PY - 2014/1/1
Y1 - 2014/1/1
N2 - The problem of identifying the most frequent items across multiple datasets has received considerable attention over the last few years. When storage is a scarce resource, the topic is already a challenge; yet, its complexity may be further exacerbated not only by the many independent data sources, but also by the dynamism of the data, i.e., the fact that new items may appear and old ones disappear at any time. In this work, we provide a novel approach to the problem by using an existing gossip-based algorithm for identifying the k most frequent items over a distributed collection of datasets, in ways that deal with the dynamic nature of the data. The algorithm has been thoroughly analyzed through trace-based simulations and compared to state-of-the-art decentralized solutions, showing better precision at reduced communication overhead.
AB - The problem of identifying the most frequent items across multiple datasets has received considerable attention over the last few years. When storage is a scarce resource, the topic is already a challenge; yet, its complexity may be further exacerbated not only by the many independent data sources, but also by the dynamism of the data, i.e., the fact that new items may appear and old ones disappear at any time. In this work, we provide a novel approach to the problem by using an existing gossip-based algorithm for identifying the k most frequent items over a distributed collection of datasets, in ways that deal with the dynamic nature of the data. The algorithm has been thoroughly analyzed through trace-based simulations and compared to state-of-the-art decentralized solutions, showing better precision at reduced communication overhead.
UR - https://www.scopus.com/pages/publications/84906346219
U2 - 10.1007/978-3-319-09873-9_23
DO - 10.1007/978-3-319-09873-9_23
M3 - Conference contribution
AN - SCOPUS:84906346219
SN - 9783319098722
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 270
EP - 281
BT - Euro-Par 2014
A2 - Silva, Fernando
A2 - Dutra, Inês
A2 - Costa, Vítor Santos
PB - Springer
T2 - 20th International Conference on Parallel Processing, Euro-Par 2014
Y2 - 25 August 2014 through 29 August 2014
ER -