public class TarjanIslandPruner
extends Object
Prune islands from a graph using Tarjan's strong-components algorithm, described in
Tarjan, R. “Depth-First Search and Linear Graph Algorithms.” SIAM Journal on Computing 1, no. 2 (1972): 146–60. doi:10.1137/0201010.
and summarized at
https://en.wikipedia.org/wiki/Tarjan%27s_strongly_connected_components_algorithm
We prune islands by mode; what is an island for pedestrians is probably not an island for cars.
Note that the order in which you remove islands for different modes is important. Island removal for cycling should be
done after island removal for walking, because we allow walking bikes. Consider the following situation, where *
represents a vertex or small island, -B- represents a bikeable edge, and -W- represents a walkable edge:
* -B- * -W- * -B- * -W- * -B- *
There is no island here for bikes, because they can simply be walked down the walking street. However, there are walking
islands here, and the walk permissions on edges will be removed because they are part of islands. At that point there will
be biking islands. Thus the walk island removal must be run before the bike island removal.
This way we are not simply removing islands that are in weak components, but removing islands that are in strong
components (a weak component is one where for every pair of vertices i and j, there is a path i -> j or a path
j -> i but not necessarily both, whereas a strong component has both). This is most relevant for cars; there are
few one-way pedestrian streets (and I don't believe r5 even supports them), and it is legal to walk a bike the wrong
way down a one way street. One-way cul-de-sacs are clearly a data error but do exist, for instance
http://www.openstreetmap.org/way/207971576 which is a ramp into an unmapped parking garage in Arlington, VA, USA.
One concern with this island remover is that it does not consider turn restrictions, so it theoretically would leave
a situation like this in the graph:
B - C
|
A
Suppose all edges can be traversed by cars in both directions, but there is a no-right-turn restriction from AB to BC.
C is not part of a larger strong component because it cannot be reached due to the turn restriction. However, this
case is believed to be sufficiently rare not to worry about.
The algorithm used here is a modified version of the one Tarjan described; Tarjan's algorithm used recursion, but with
real world graphs a naïve implementation using recursion quickly caused a StackOverflow in the JVM. The modified
algorithm works as follows.
Tarjan's algorithm uses a depth-first search. We loop over all vertices, and when we find one that hasn't been explored, we
add it to the toExplore stack.
We then pop a vertex off the toExplore stack. If it has not previously been explored, we
give it a consecutive discovery index, set the lowest-discovery-index vertex that is both reachable from this vertex and a
is also a predecessor vertex to the itself, and we add it to the tarjan stack to indicate it is a precessor of any
vertices explored later in the search. We push this vertex back on the toExplore stack to make sure it is updated once all successor
nodes have been explored, and then loop over all the edges usable by the chosen mode originating from the selected vertex.
For each edge, we check if the target vertex has previously been explored. If it has not, we put the source vertex
and then the target vertex back on the toExplore stack. This way, the target vertex will
be popped from the toExplore stack and explored, and then the source vertex will be popped and updated. If it has,
we check if it is on the tarjan stack; if it has, we know that it is a predecessor to the source vertex, and update
the lowest-discovery-index predecessor based on the discovery index of the target vertex.
If the vertex popped off the toExplore stack has already been explored, we first check all successors; if they have a higher discovery
index than the vertex popped off the stack, we know they were explored downstream of the source vertex, and their
lowest-discovery-index predecessor was on the Tarjan stack when they were explored. If the lowest-discovery-index predecessor
is less than the lowest-discovery-index predecessor of the popped vertex, we update the lowest-discovery-index predecessor of the
popped vertex. This looping over edges is potentially less efficient than the original implementation which used the call
stack to keep track of which successors could have been updated, but it is simple, eliminates recursion, and is quick
even in relatively large street networks.
If the vertex popped off the toExplore stack has itself as the lowest-discovery-index predecessor, we have found a
strong component. We pop vertices off the Tarjan stack until we reach the vertex popped from the toExplore stack.
These vertices form a strong component, which we record.
Once all strong components have been found, we loop through all of them and if they have a size less than the minimum
component size, we remove permissions for the relevant mode from all edges connected to their vertices.
We previously used a flood-fill algorithm designed for undirected graphs. This worked okay for walking and biking because
the graph is effectively undirected; for every edge there is a corresponding back edge (we don't support one-way streets
for walking, and walking a bike the wrong way down a one-way street is always an option). However, for cars this is an
issue; there can be components due to bad data that are one-way cul-de-sacs (or small islands of streets connected to the
outside world only by a one-way street). Obviously this is bad data or we'd have a lot of cars piled up in one place,
but in some sense needing to do island removal at all is a fix for bad data; islands should not exist in OSM (except
for physical islands that must be reached by boat, etc).
(cf. Sachar, Louis. Wayside School Gets a Little Stranger. New York, HarperCollins. 1995, in which one-way elevators are installed).
- Author:
- mattwigway