Map data is an interesting beast. Having worked on it for the last couple of years, I can note that many of the general data processing challenges arise. However, map data also comes with a number of particular features that set it apart. I have split this series into five parts:
- It’s All About Relations.
- The Role of Geopolitics.
- Editing And Processing. (Coming soon)
- Tracking Changes. (Coming soon)
- False Assumptions Programmers Make. (Coming soon)
In this article, we will look at one of the features that distinguish a map from a simple geospatial dataset: Relations.
What Is a Map In The 21st Century?
Before diving deeper, let us first consider what the word “map” means. This is harder to answer than it might appear. The traditional understanding of a paper-based map would be a graphical representation of an area of the world for the purposes of navigation or planning.
However, with the rise of software, things have become more complex. We now consider as a “map” both such a graphical representation and the underlying data which enables geospatial applications. Typical applications are the following:
- Display, meaning drawing the geographic data for a human eye.
- Geocoding, meaning converting a text description of a place into a geographical location.
- Reverse geocoding, meaning turning a geographic location into a text description.
- Routing, meaning finding an optimal way of getting from one geographic location to another, given a provided number of constraints like for example the mode of transport to use.
There is more you can do with maps: Finding all the addresses that are in an area prone to flooding for example. Finding all the supermarkets that can be reached from a location within a 30 minutes drive. A typical extension of routing is to generate maneuvers and guidance instructions: “Turn half-right and then continue straight”.
Nevertheless, the four basic building blocks described above turn out to be a very representative sample of common uses of map data.
In the following, I will use examples from OpenStreetMap. To follow, you will need a basic understanding of the OSM data model.
Here is a summary, you can find more explanation in the excellent OpenStreetMap wiki.
What Does Map Data Look Like?
Let us have a look at an example of two data points from OpenStreetMap and how this data is encoded in the .osm format:
<node id="1717478170" visible="true" version="1" changeset="11306798" timestamp="2012-04-15T07:33:36Z" user="angelx" uid="595316" lat="45.6684436" lon="8.3304538"/> <!-- other nodes… --> <way id="939730625" visible="true" version="1" changeset="104233854" timestamp="2021-05-06T08:22:36Z" user="Francoerbi41" uid="4982687"> <nd ref="1717478170"/> <!-- other node references… --> <nd ref="8696557396"/> <tag k="highway" v="footway"/> <tag k="surface" v="ground"/> </way> <way id="255403888" visible="true" version="6" changeset="108351951" timestamp="2021-07-21T07:39:03Z" user="Francoerbi41" uid="4982687"> <nd ref="2610953746"/> <!-- other node references… --> <nd ref="2610953746"/> <tag k="access" v="yes"/> <tag k="building" v="ruins"/> <tag k="castle_type" v="defensive"/> <tag k="historic" v="castle"/> <tag k="historic:civilization" v="medieval"/> <tag k="lit" v="yes"/> <tag k="name" v="Castello di Vintebbio"/> <tag k="wikimedia_commons" v="Category:Castello di Vintebbio"/> </way>
One obvious feature of map data that sets it apart from other data, is that most of the objects have a location component. In the above example, the location of the two features is encoded through references to nodes which each have a geographic location. The example also shows that things are more complex than that: Modern map data contains many detailed attributes which may or may not be useful depending on the application.
The footpath object contains amongst others information about the surface material. The castle ruins object contains the information that it is lit at night, that it is accessible and that it dates back to the medieval era.
There is still one thing missing from our survey of map data: I would argue a central feature of a map are the relationships between objects. This is one of the factors distinguishing a “map“ from a simple “geospatial dataset“. For most map applications we need to know how map objects are connected to one another.
An obvious instance of this observation is the road network: To enable routing applications, the connection between road elements needs to be known. Just because two objects are geometrically overlapping does not mean that a motorized vehicle can go from one to the other. Obvious counterexamples are bridges, one-ways, and restricted turns.
Openstreetmap solves this problem in different ways. First of all, road elements that are connected share a common node, highlighted in yellow in the following example:
What about bridges and tunnels? A possible way of modeling such a constellation would be to have the two crossing roads share a node at the location and annotate that node with the information that it is a bridge and that the crossing roads do not actually connect. That is not how OpenStreetMap handles it: OSM takes the slightly more intuitive but also less rigorous approach of letting roads that are on different layers cross without sharing a node, using the layer tag in conjunction with a tunnel or bridge tag to indicate their ordering.
Another point to note is that just because two roads are physically connected, that does not mean that a vehicle can legally navigate from one to the other. Turn restrictions are one such limitation. A turn restriction is another way in which pieces of map data relate to one another: “You must not go from road A to road B if you are driving a truck”. This is almost exactly how OSM models turn restrictions: They are defined as relations on top of the road graph with a member corresponding to the element the maneuver starts on and a member for the element where it ends.
Let us consider another example of map data relations: To navigate towards a POI we would ideally like to know towards which road the entrance of a building is oriented. Otherwise, the user might end up at the back of a huge complex. You could say that an entrance relates a POI or building to the road network. As we will see, the OpenStreetMap data is currently limited in this regard.
OpenStreetMap does have a convention for mapping entrances. The entrance nodes need to be nodes of the building in question, this creates the link to identify the building or complex that the entrance is a part of.
Linking the entrance to the road network is less evident. You could in theory map out the footpaths connected to the entrance and connect them with the road network. In practice, however, this is not often done. A routing application would instead likely determine the road closest to the entrance as a heuristic. This is good enough for most situations but may result in unexpected behavior in certain edge cases. Take for example a building next to multiple layers of roads (bridges/tunnels). How would you know which layer the entrance links to? An explicit relation between a building entrance and the road it is associated with could be useful in such situations.
The image below shows a snapshot of OSM data around Notre Dame in Paris with all entrances highlighted in yellow. You can see that some of them are connected to ways while others are not.
What is a country? Simply a polygon describing the shape of its borders? It is more complicated than that. And once again the challenge evolves around relationships between data points.
When drawing a map for display purposes, for example, would you want to label a stretch of border with the respective countries on either side?
When routing across this border, an application might want to display “You are now traversing the border between Argentina and Chile”.
For both applications, you would want to know that a stretch of border is part of two neighboring countries. In other words, you want to know the topology of the administrative structure.
OpenStreetMap achieves this by representing countries as relations defined on top of border geometries. A country is the set of all lines that form its outer (or inner) borders. Some of these lines can at the same time be outer (or inner) borders of other countries.
Therefore, in the example above, an application rendering the stretch of border could query what relations the way is part of. Just like a node shared by multiple highway elements in the road graph determines that the two elements are connected, a shared boundary element indicates that two countries border each other.
Consider how the example above is modeled in the OpenStreetMap data:
<osm version="0.6" generator="CGImap 0.8.6 (1384222 spike-06.openstreetmap.org)" copyright="OpenStreetMap and contributors" attribution="http://www.openstreetmap.org/copyright" license="http://opendatacommons.org/licenses/odbl/1-0/"> <relation id="167454" visible="true" version="779" changeset="119882038" timestamp="2022-04-19T00:13:08Z" user="Arctic gnome" uid="6511217"> <member type="way" ref="40931658" role="outer"/> <!-- A lot of other members and tags --> <tag k="type" v="boundary"/> <tag k="admin_level" v="2"/> <tag k="boundary" v="administrative"/> <tag k="default_language" v="es"/> <tag k="name" v="Chile"/> </relation> <relation id="286393" visible="true" version="1062" changeset="119274519" timestamp="2022-04-03T18:21:32Z" user="hanzlan" uid="629849"> <member type="way" ref="40931658" role="outer"/> <!-- A lot of other members and tags --> <tag k="type" v="boundary"/> <tag k="admin_level" v="2"/> <tag k="boundary" v="administrative"/> <tag k="name" v="Argentina"/> </relation> </osm> <way id="40931658" visible="true" version="11" changeset="65980373" timestamp="2019-01-03T07:44:36Z" user="jptolosa" uid="1975220"> <nd ref="307344798"/> <!-- a lot more nodes ... --> <tag k="admin_level" v="2"/> <tag k="boundary" v="administrative"/> <tag k="description" v="Divisoria de aguas continental"/> <tag k="natural" v="ridge"/> </way> </osm>
In this example, the way with ID 40931658 describes a piece of border geometry (which happens to also be a ridge at the same time). To a user of the data, its role in the administrative topology of the map is clear because it is both a member of the Argentina relation and a member of the Chile relation.
Topology vs Geometry
All of the examples cited above highlight a more general point about map data: Most map data has both a geometry and a topology.
In fact, it turns out that existing map data standards that predate OpenStreetMap recognize this distinction and make the topology of a map the central component of their designs. The GDF standard for example distinguishes between the level 0 structure of the map (topology), the level 1 structure (features), and a level 2 structure (complex features).
The level 0 topology is a planar graph consisting of nodes, edges, and faces. A combination of nodes can form an edge, and a combination of edges can form a face.
Level 1 assigns semantic meaning to the objects defined in level 0. Level 2 can group several level 1 features together to form more complex, higher-level constructs.
We can see a certain resemblance with the OSM data model: Nodes in OSM roughly correspond to nodes in GDF. Ways in OSM resemble edges in GDF. GDF faces can be emulated with tags on ways or with relations in OSM.
Where GDF uses level 1 features to assign semantic meaning to the topology objects, OSM uses tags on these objects.
Where GDF defines a level 2 structure to group features together into more complex patterns, OSM allows recursively adding relations on top of relations to express the same.
That being said, OSM is far less strict about the topological structure of the data and does not enforce planarity. It is easy to see why this would be an important factor for the success of the OSM project: Being a project carried by the contributions of enthusiastic volunteers, it is important to make the editing of the map as simple as possible. Enforcing strict rules concerning the shape of the data can be a big turn-off for new editors joining the OSM project.
Nevertheless, having the topology of the map available is a major advantage for consumers of this data. We have seen above how the topology provides answers to common queries. There is, however, a second, less intuitive aspect:
Even though maps are about geospatial data, it is good to avoid geospatial queries and operations wherever possible.
Let me explain this rather counterintuitive statement:
A geospatial query or operation is any computation that works directly on the geometry of map data. A typical example would be calculating the point at which two lines intersect or the union of two polygons.
When operating on the geometry of the map, these queries rely on a numerical model and frequently run into numerical problems. Anyone who has ever used the JTS library to run geospatial operations in Java or postgis to do the same on a postgresql DB will have run into the dreaded NonNodedIntersectionException. This exception is indicative of a numerical instability that frequently occurs when two geometries barely touch one another or when the result of a computation is very small, below the range in which numerical computations are stable.
If the map followed a topological structure as defined by the GDF specification, the same queries can be expressed but without relying on numerical computations. More technically, as the graph is already fully noded (a different word for planar), there can be no non-noded intersections.
In this article, I hoped to give an introduction to the solution space of digital maps: What makes map data special and how it is different from a simple collection of data points with a location. In particular, I pointed out that a lot of the value that maps provide comes in the form of relationships between objects.
We also saw, how OpenStreetMap allows modeling some of these relationships.
If you want to learn more about maps, the best way I found is to go and improve OSM in your neighborhood. Also, make sure to check out the follow-up post which is about map data and geopolitics.