Graph traversal, part four

Let’s return now to our modified Zork graph fragment from a couple episodes back:

It is pretty obvious just from looking at this graph that if you start from the Troll Room and follow the arrows, there are four rooms that you must enter no matter what combination of edges you take: the East-West Passage, the Dam, the Dam Lobby and the Maintenance Room. And moreover, we’ll encounter them in that order. Let’s call these nodes “the inevitable nodes of the Troll Room”.

Actually, do we know for sure that we’ll encounter them in that order? It seems plausible that the inevitable nodes come in a particular order, but naybe that’s true on this graph but not true in general. I propose the following theorem: suppose a node X has two distinct inevitable nodes, Y and Z. Then either Y is an inevitable node of Z, or Z is an inevitable node of Y, but not both. And therefore, all the inevitable nodes of a particular node can be put into the unique order in which they must inevitably be visited.

It’s not hard to prove this fact by considering the cases: if neither Y nor Z are inevitable nodes of each other then there must be a path that goes from A through Y and never goes through Z, which contradicts the supposition that Z is an inevitable node of A. If Y and Z are both inevitable nodes of each other then there is a cycle path, contradicting the supposition that we’re in a DAG. The only remaining possibility is that exactly one is an inevitable node of the other.

Today I am interested in DAGs where there is one node that is an inevitable node of every other node in the graph. (Naturally no node can be an inevitable node of itself, as that would imply that there is a cycle in the graph.) Our little graph here is such a graph: no matter where you start, you inevitably end up in the Maintenance Room and then can go no further. Let’s call such a node a “final inevitable node”.

My question is: suppose we have a DAG that contains a final inevitable node. I want an algorithm that takes as its input any other node, and outputs the closest inevitable node to that node; that is, the first inevitable node that will be encountered. We know that there must be a unique first inevitable node because given any two inevitable nodes, one is after the other.

For example, the Troll Room has four inevitable nodes, the closest obviously being the East West Passage. This is not very interesting because there is only one edge out of the Troll Room! But consider the Round Room; now it is not so clear that the closest inevitable node of the Round Room is the Dam. And so on.

There are a number of ways to solve this problem; we’ll consider a few ideas over the next few episodes.

19 thoughts on “Graph traversal, part four”

1. If I’m understanding your definition of an inevitable node correctly, then the closest inevitable node to the Loud Room is the Deep Canyon, not the Dam. Given your definition at the top of the article, the inevitable nodes of the Loud Room are the Deep Canyon, the Dam, the Dam Lobby and the Maintenance Room (in that order). Obviously the Deep Canyon is not an inevitable node of the Troll Room, but that doesn’t seem to factor into your definition. (As defined, inevitable nodes are a property of a node/graph pair, not of the graph itself, unlike the “final inevitable node”, which is a property of the graph itself.)

Perhaps you meant to say “Consider the Deep Canyon…” ?

• I meant to say the Round Room, not the Loud Room, silly me. I’ll fix it.

• If the Deep Canyon meets your intended definition of an inevitable node of the Loud Room, then I would suggest that you make clear that if Y is reachable from X, but is not an inevitable node, each inevitable nodes of Y may independently be or not be an inevitable node of X.

2. I wonder if there’s some way to analyze the graph to find an inevitable node (of A) without running through every traversal (from A).

If you must run through every traversal, you could turn the first traversal into a collection of node/count pairs. Then, for every other traversal, as you see each node, increment its count. At the end, the earliest node in the list that has the maximal count (which is definitely “inevitable”, as the final inevitable node will have the maximal count) will be the first inevitable node.

I’m going to see if I can find a more functional approach, as I’m not satisfied with this one.

• I’ll be discussing this subject over the next couple of episodes; I actually don’t know what the best solution is. It sounds like you’ve got a good idea going here.

• Consider the following algorithm:

define a method TraverseFor(Node n, int distance) that finds all possible traversals from node N (to any node) that have “distance” (or less) nodes in them. This should be fairly easy to adapt based on the existing traversal algorithm but just having a short circuit check based on distance.

next we have a `for` loop that goes from 1 to infinity, calling `TraverseFor` on the loop variable. If the intersection of all of the paths returned contains one item (the starting node) then you have not yet reached the next inevitable node. One some iteration you will eventually have exactly two nodes returned in the intersection. The second node is the next inevitable node.

(Computing the intersection of all of these sets would be able to be more efficient if we represented our paths in a different type of data structure designed for computing set based operations. Making that change would likely be necessary to make this approach efficient.)

This algorithm does (potentially) traverse more than it needs to, but it doesn’t traverse the entire tree just to find the next inevitable node. To be more precise, it will compute all possible path from the starting node of a distance equal to the longest path from the current node to the next inevitable node.

• Something like this?
private IEnumerable<ImmutableStack<N>> TraverseForNext(IEnumerable<ImmutableStack<N>> innerSet)
{
foreach (var s in innerSet)
{
var edges = Edges(s.Peek());
if (edges.Count == 0)
{
yield return s;
}
else
{
foreach (var kp in Edges(s.Peek()))
yield return s.Push(kp.Value);
}
}
}

private IEnumerable<N> Common(IEnumerable<ImmutableStack<N>> allSets)
{
if (allSets.Count() == 0)
return Enumerable.Empty<N>();
IEnumerable<N> result = allSets.First();
foreach (var s in allSets.Skip(1))
{
result = result.Intersect(s);
}
return result;
}
public N ClosestInevitableNodeV4(N start)
{
IEnumerable<ImmutableStack<N>> s = new ImmutableStack<N>[] { ImmutableStack<N>.Empty.Push(start)};
while (true)
{
s = TraverseForNext(s);
var r = Common(s);
if (r.Count() == 2)
{
return r.First();
}
}
}

• Looks like you’ve got the general idea. I made an implementation myself and spent a bit of time trying to clean it up, doing things like keeping the start note out of the path and such helps clean up the code:

```public static IEnumerable<ImmutableStack>
MovePaths(
IEnumerable<ImmutableStack> paths,
Func<N, ICollection> getEdges)
{
foreach (var path in paths)
{
var edges = getEdges(path.Peek());
if (!edges.Any())
yield return path;
else
foreach (var edge in edges)
yield return path.Push(edge);
}
}

public static T GetNextInevitableNode(
T start,
Func<T, ICollection> getEdges)
{
var paths = getEdges(start)
.Select(node => ImmutableStack.Create(node));
while (true)
{
paths = MovePaths(paths, getEdges)
.ToArray();
var intersection = paths
.Cast<IEnumerable>()
.Aggregate((a, b) => a.Intersect(b))
.ToList();
if (intersection.Any())
return intersection.First();
}
}
```
• This seems like it would re-traverse quite a bit if the first inevitable node is far from the start. Consider a graph where the only inevitable node is the final inevitable node. Suppose the longest path from start to finish is n. You will have to loop a full n times! It’s duplicated because in order to find those of distance 2 away, you have to re-visit every length 1 traversal.

• In a naive implementation, you are right, but in the proposed one, we build the paths of distance 2 from the paths of distance 1.

3. A brute force solution:

```public N ClosestInevitableNode(N start)
{
var traversals = AllEdgeTraversals(start);
// Find the set of InevitableNode
ImmutableList<N> inevitableNodes =
ImmutableList<N>.Empty.AddRange(
from path in traversals.First() select path.Value);
// Remove the not inevitable nodes
foreach (var traversal in traversals.Skip(1))
inevitableNodes = ImmutableList<N>.Empty.AddRange(
inevitableNodes.Intersect(from path in traversal select path.Value));
// Find the first one
return inevitableNodes.First();
}
```
• Maybe this one is better

```public N ClosestInevitableNodeV2(N start)
{
// Sort and try the smallest first.
var traversals = AllEdgeTraversals(start).OrderBy(t => t.Count());
// Check each node to find the first inevitable
foreach(var path in traversals.First())
{
// Is inevitable ?
if (traversals.Skip(1).All(s => (from p in s select p.Value).Contains(path.Value)))
return path.Value;
}
throw new ArgumentException("start");
}
```
• // A recursive one
private ImmutableList&lt:N> InevitableNodes(N start)
{
var edges = Edges(start);
if (edges.Count == 0)
{
return ImmutableList<N>.Empty;
}
else
{
var n = edges.First().Value;
var r = InevitableNodes(n).Add(n);
foreach (var pair in edges.Skip(1))
r = ImmutableList<N>.Empty.AddRange( r.Intersect(InevitableNodes(pair.Value).Add(pair.Value)));
return r;
}
}
public N ClosestInevitableNodeV3(N start)
{
return InevitableNodes(start).Last();
}

4. I’d like to put forward the notion that we add the word “Naybe” to the English language. Clearly, it is to mean almost the same thing as “Maybe” but weighted more towards the probability of the negative.

5. I guess this is against the spirit of the blog series, but here’s an imperative algorithm that relies on assigning and mutating state associated with each node, a ‘colour’ value.

1. Paint the start node white.
2. Visit any successor. Paint it blue. This is our first candidate “inevitable node”. (If it has no successors, there is no inevitable node. Stop.)
3. From the blue node, continue to travel an arbitrary path, painting each visited node red. These are further candidates for inevitable node.

If a node is inevitable, it must appear in every traversal from our start node. Therefore, it must appear in this traversal. Therefore, *all* inevitable nodes *must* be coloured blue or red now. (But of course, we do not yet know whether each blue or red node is inevitable.)

4. Repeatedly pick a white node X. Paint it black.
4.1. Consider each successor Y of X:
4.1.1. If Y is unpainted, paint it white.
4.1.2. If Y is black, ignore it and continue.
4.1.3. If Y is blue, ignore it and continue.
4.1.4. If Y is red, we have found a route that circumvents our blue node, and possibly one or more other red nodes, proving that they are not inevitable. Recursively paint all of Y’s red and blue predecessors white, and paint Y blue.

An inevitable node will never be painted black. We know that only the nodes painted red or blue originally could be inevitable nodes, and we only ever change their colour when we have proved that they are not inevitable.

Eventually, there will be no nodes left painted white. (Every iteration reduces the sum total of red and white nodes, eventually we must run out of white nodes.) At this point, all paths from the start node must consist of a series of black nodes followed by the blue node. (A black node cannot be followed by an uncoloured node, since step 4.1.1 would have coloured it. It cannot be followed by a red node, because step 4.1.4 would have painted the red node blue. It cannot be followed by a white node, because we’ve repeated step 4 until there are none left. Additionally, we know that the start node will be black since it started off white and will have been painted the first time round step 4.) Since all of the paths pass through the blue node, it is inevitable. Since none of the black nodes can be inevitable, the blue node must be the *nearest* inevitable node. (I probably also need to prove that every black node *has* a successor. I think it’s necessarily true if the graph has a final inevitable node, but I should be a bit more rigorous here.)

This appears to be time complexity O(E+P) where E is the number of edges that fall on at least one path between the start node and the nearest inevitable node, and P is the maximum path length from the nearest inevitable node to a terminal node. (We don’t care which white node we pick in step 4, so they can be stored in a stack for O(1) retrieval.) I think the E cost is inevitable in any algorithm. If your graph is huge and the nearest inevitable node is guaranteed to be close, you might find that P is a problem. I haven’t yet thought of a way to avoid this cost without retracing nodes multiple times.

Is that correct? Is it convincing? Is there a nice equivalent to this algorithm in a functional style? (Clearly you could do the same thing with an immutable dictionary for colours, a couple of immutable stacks for red nodes and white nodes, and a recursively applied ‘get-next-step’ function, but I’m not sure it would really count.)