Abstract
Web data is the most prominent source of information for deciding where
to go and what to do. Exploiting this source for geographic analysis, however,
does not come without difficulties. First, in recent years, the amount and
diversity of available Web information about urban space have exploded, and
it is therefore increasingly difficult to overview and exploit. Second, the bulk
of information is in an unstructured form which is difficult to process and
interpret by computers. Third, semi-structured sources, such as Web rankings,
geolocated tags, check-ins, or mobile sensor data, do not fully reflect the
more subtle qualities of a place, including the particular functions that make
it attractive. In this article, we explore a method to capture leisure activity
potentials from Web data on urban space using semantic topic models. We
test three supervised multi-label machine learning strategies exploiting geolocated webtexts and place tags to estimate whether a given type of leisure
activity is afforded or not. We train and validate these models on a manually
curated dataset labeled with leisure ontology classes for the city of Zwolle,
and discuss their potential for urban leisure and tourism research and related
city policies and planning. We found that multi-label affordance estimation
is not straightforward but can be made to work using both official web texts
and user-generated content on a medium semantic level. This opens up new
opportunities for data-driven approaches to urban leisure and tourism studies.
to go and what to do. Exploiting this source for geographic analysis, however,
does not come without difficulties. First, in recent years, the amount and
diversity of available Web information about urban space have exploded, and
it is therefore increasingly difficult to overview and exploit. Second, the bulk
of information is in an unstructured form which is difficult to process and
interpret by computers. Third, semi-structured sources, such as Web rankings,
geolocated tags, check-ins, or mobile sensor data, do not fully reflect the
more subtle qualities of a place, including the particular functions that make
it attractive. In this article, we explore a method to capture leisure activity
potentials from Web data on urban space using semantic topic models. We
test three supervised multi-label machine learning strategies exploiting geolocated webtexts and place tags to estimate whether a given type of leisure
activity is afforded or not. We train and validate these models on a manually
curated dataset labeled with leisure ontology classes for the city of Zwolle,
and discuss their potential for urban leisure and tourism research and related
city policies and planning. We found that multi-label affordance estimation
is not straightforward but can be made to work using both official web texts
and user-generated content on a medium semantic level. This opens up new
opportunities for data-driven approaches to urban leisure and tourism studies.
Original language | English |
---|---|
Pages (from-to) | 143-156 |
Number of pages | 14 |
Journal | Computers, Environment and Urban Systems |
Volume | 73 |
DOIs | |
Publication status | Published - Jan 2019 |
Keywords
- Place affordance
- Urban space
- Knowledge extraction
- City planning
- Latent semantics
- Multi-label classification