Linking Streets in OpenStreetMap to Persons in Wikidata
Daria Gurtovoy Universität Bonn Germany |
Simon Gottschalk L3S Research Center, Leibniz Universität Hannover Germany gottschalk@L3S.de | Website |
Motivation
Streets are often named after famous or distinguished individuals who may have a direct connection to the specific location. While Wikidata has information about persons, OpenStreetMap has information about streets.
➔ Can we connect OpenStreetMap and Wikidata by linking streets in OpenStreetMap to persons in Wikidata?
Example
The Wilhemstraße in Berlin could potentially be named after several different Wilhelms. Due to his popularity and as he was born in Berlin, we assume that the Wilhelmstraße was named after Friedrich Wilhelm I.
We can confirm our decision by checking Wilhelmstraße on Wikidata and its "named after" property or by checking Wilhelmstraße on OpenStreetMap and its "name:etymology:wikidata" key. However, these properties are rarely used and Wikidata does not cover all streets in OpenStreetMap.
Problem Statement
Approach
Our method StreetToPerson is based on the following pipeline:
Overview of StreetToPerson. Yellow boxes show example values.
Knowledge Graph Preprocessing
We use four datasets created from Wikidata:
Person index : given a term as input, returns the Wikidata IDs of all people that match it.Person occupation index : all occupations of a person (e.g., monarch or writer).Person location index : relevant locations such as a person’s birthplace.Spatial dependencies : containment relations between locations (e.g., Berlin is located in Germany).
Friedrich Wilhelm I. in the person occupation index and the person location index.
Street Name Truncation
We remove affixes (i.e., prefixes and suffixes) from the street name which are not related to the names of person (e.g., "street", "road", and "avenue"). Our list of suffixes is available on GitHub..
Candidate Retrieval
We query the person index with the truncated street name to retrieve a set of candidates.
Feature Extraction
We extract 30 features for a street-to-person pair (s,p):
- Link count: The number of links pointing to p in Wikipedia.
- Name: Four binary features about which part of the person index was used (e.g., first name or last name).
- Occupations: 20 frequent occupations and whether they were found in the person occupation index .
- Spatial relations: Five features representing whether there is a spatial relation between p and s: "born", "died", "buried", "educated at", and "work location".
Example of the spatial dependencies between the street "Wilhelmstraße" and the city Berlin. Arrows denote "located in" relations (e.g., Berlin is located in Germany).
Street-to-person Classifier
Using the 30 features, we train a random forest model that classifies a street-to-person pair as positive or negative.
Evaluation
Using Wikidata's "named after" property, we extract 4,799 pairs of German streets and persons they were named after. We use these pairs together with negative examples as training and test datasets.
Baselines
- TagMe, a traditional entity linking approach on short text fragments [Ferragina, 2010].
- Popularity Ranking (PopRank): Simpler version of StreetToPerson where we take the person with the highest link count.
- Relevance Ranking (RelRank): The only existing approach for street-to-person linking [Almeida, 2016].
Evaluation of the Classification on Wikidata
StreetToPerson clearly outperforms the baselines and achieves a precision and recall of more than 0.9.
Precision | Recall | F1 Score | |
---|---|---|---|
TagMe | 0.49 | 0.45 | 0.47 |
PopRank | 0.69 | 0.66 | 0.67 |
RelRank (all entities) | 0.08 | 0.08 | 0.08 |
RelRank (person entities) | 0.35 | 0.11 | 0.17 |
StreetToPerson | 0.95 | 0.91 | 0.93 |
Evaluation of the classification for StreetToPerson and the selected baselines using 10-fold cross validation.
Application of StreetToPerson on OpenStreetMap
We apply StreetToPerson on German streets in OpenStreetMap. For 669,304 streets, we find at least one candidate person. For 183,022 of those streets, one person is classified positively.
Number of | Bremen
The German state Bremen
|
NRW
The German state Nordrhein-Westfalen
|
Germany |
---|---|---|---|
Streets
The amount of streets in Wikidata
|
6,733 | 219,768 | 1,321,464 |
with candidate persons
The amount of streets in Wikidata for which we find at least one candidate person in the person index
|
2,504 | 110,968 | 669,304 |
Candidate persons
The total number candidate persons found for the streets
|
47,659 | 2,675,761 | 16,165,454 |
Street-to-person pairs
The total number of street-to-person pairs returned by the classifier
|
896 | 28,857 | 183,022 |
Number of street-to-person relations identified for OSM streets in Germany and two of its states.
Some streets in OpenStreetMap denote the person they were named after using the "name:etymology:wikidata" key. We use these streets for estimating precision and recall of StreetToPerson on OpenStreetMap.
StreetToPerson achieves a precision of more than 0.9 and a recall of more than 0.6 on OpenStreetMap.
Bremen | NRW | |
---|---|---|
Precision | 0.94 | 0.90 |
Recall | 0.64 | 0.61 |
F1 Score | 0.76 | 0.73 |
Evaluation of StreetToPerson on the OSM ground truth in two German states.
Citation
title={{Linking Streets in OpenStreetMap to Persons in Wikidata}},
author={Gurtovoy, Daria and Gottschalk, Simon},
year={2022},
booktitle={Proceedings of the The Web Conference}
}
References
- [Almeida, 2016] Paulo Dias Almeida, Jorge Rocha, Andrea Ballatore, and Alexander Zipf. 2016. Where the Streets Have Known Names. In International Conference on Computational Science and Its Applications (ICCSA ’16), Vol. 9789. 1–12. https://doi.org/10.1007/978-3-319-42089-9_1
- [Ferragina, 2010] Paolo Ferragina and Ugo Scaiella. 2010. TAGME: On-the-fly Annotation of Short Text Fragments (by Wikipedia Entities). In Conference on Information and Knowledge Management (CIKM). 1625–1628.
Notes
- Source: Image of Paul Wilhelm
- Source: Image of Friedrich Wilhelm I.
- Source: Image of Wilhelm Busch
- OpenStreetMap, the magnifying glass logo and State of the Map are registered trademarks of the OpenStreetMap Foundation
Acknowledgements
This work was partially funded by the Federal Ministry of Education and Research (BMBF), Germany under "Simple-ML" (01IS18054) and the DFG, German Research Foundation, under "WorldKG" (424985896).