Bayes network (BN) topology: a directed acyclic graph G. Removing the edge directions from G we get the underlying undirected graph of the BN. We distinguish between directed paths, defined as usual; and undirected paths, which are simple (loopless) paths in the underlying undirected graph of G. Note that in these undirected paths we sometimes still refer to the direction of the edges in the original graph, e.g. in defining d-separation.
Let X, Y, Z be disjoint sets of nodes in a directed graph G. Then X is d-separated from Y given Z, denoted d-sep(X,Y|Z), just when every node x in X is d-separated from every node y in Y given Z. (Definition of latter follows).
Node x is d-separated from node y given Z, denoted d-sep(x,y|Z), just when every undirected path from x to y is blocked by Z.
An undirected path p from x to y is blocked by Z just when there exists a node w on p that is blocked by Z. Note that w must be neither x, nor y, and the blocking is defined (below) relative to the path p.
A node w is blocked by Z on path p if either of the following conditions hold:
The topological statements on the role of w in path p is w.r.t. the
edge directions in the directed graph: "converging" means that
both of the edges in p are directed towards w; "diverging" means
they are both directed outwards from w, and "passthrough" means
one edge towards w and one outwards from w.
Since probabilisitic inference in BNs is NP-hard, and even computing
d-separation needs to consider a potentially exponential number of paths,
pre-processing of the BN to remove nodes if possible is important.
First, we define the concept of "nodes of interest". For probabilistic
reaoning, these are the evidence nodes and the query nodes (i.e. node
for which we need to compute the marginal probability). For d-separation
queries d-sep(X,Y|Z),
the nodes in X, Y, Z are all nodes of intetest
(and in particular Z is "the evidence").
We consider 2 important special cases:
Prior to checking a query d-sep(x,y|Z), one can remove all barren nodes,
and also all evidence root nodes. This can be repeated until no more nodes
can be removed. The reason for this is that any path p through a barren
node must be converging, and since w is not in Z, then w blocks p.
Likewise, any path p through a root node must be diverging, and
if w is in Z then it blocks p.
Likewise, for probabilistic reasoning query P(X|E=e),
all barren nodes can be removed
prior to reasoning. Any evidence root node v can be removed as well,
but care must be taken to remove the irrelevant table entries from
all children of v.
In addition, any evidence node v that is d-separated from X given
the rest of the nodes in E can be deleted from the evidence list.
That is because d-separation entails independence.
These pre-processing operations may be repeated until no
additional nodes can be removed.
In general, probabilistic reaoning in BNs is NP-hard. Special case topologies
allow for polynomial-time algorithms:
Pre-processing of BNs for reasoning tasks
Pre-processing for d-separation
Pre-processing for probabilistic reasoning
Special topologies: polytree and directed-path singly connected graphs