TY - GEN
T1 - Crowdsourcing high-quality structured data
AU - Halpin, Harry
AU - Lykourentzou, Ioanna
N1 - Publisher Copyright:
© 2019, Springer Nature Switzerland AG.
PY - 2019
Y1 - 2019
N2 - One of the most difficult problems faced by consumers of semi-structured and structured data on the Web is how to discover or create the data they need. On the other hand, the producers of Web data do not have any (semi)automated way to align their data production with consumer needs. In this paper we formalize the problem of a data marketplace, hypothesize that one can quantify the value of semi-structured and structured data given a set of consumers, and that this quantification can be applied on both existing data-sets and data-sets that need to be created. Furthermore, we provide an algorithm for showing how the production of this data can be crowd-sourced while assuring the consumer a certain level of quality. Using real-world empirical data collected via data producers and consumers, we simulate a crowd-sourced data marketplace with quality guarantees.
AB - One of the most difficult problems faced by consumers of semi-structured and structured data on the Web is how to discover or create the data they need. On the other hand, the producers of Web data do not have any (semi)automated way to align their data production with consumer needs. In this paper we formalize the problem of a data marketplace, hypothesize that one can quantify the value of semi-structured and structured data given a set of consumers, and that this quantification can be applied on both existing data-sets and data-sets that need to be created. Furthermore, we provide an algorithm for showing how the production of this data can be crowd-sourced while assuring the consumer a certain level of quality. Using real-world empirical data collected via data producers and consumers, we simulate a crowd-sourced data marketplace with quality guarantees.
KW - Crowdsourcing
KW - Human computation
KW - Resource allocation
KW - Structured data
UR - http://www.scopus.com/inward/record.url?scp=85063529748&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-11680-4_29
DO - 10.1007/978-3-030-11680-4_29
M3 - Conference contribution
AN - SCOPUS:85063529748
SN - 9783030116798
T3 - Communications in Computer and Information Science
SP - 304
EP - 319
BT - Information Management and Big Data - 5th International Conference, SIMBig 2018, Proceedings
A2 - Muñante, Denisse
A2 - Alatrista-Salas, Hugo
A2 - Lossio-Ventura, Juan Antonio
PB - Springer
T2 - 5th International Conference on Information Management and Big Data, SIMBig 2018
Y2 - 3 September 2018 through 5 September 2018
ER -