When one particular species of event has always … been conjoined with another,
we make no longer any scruple of foretelling one upon the appearance of the other,
and of employing that reasoning, which can alone assure us of any matter of fact or existence.
We then call the one object, Cause; the other, Effect.
- David Hume's (1748, §7) An Enquiry Concerning Human Understanding

Pearl Causal Model

Graph Theory

The Pearl Causal Model relies on a graphical approach to causality (Spirtes et al, 2000, Pearl, 2000, 2009)
The intellectual background of the Pearl Causal Model includes the graph-theoretical work of Lauritzen, Speed, & Vijayam (1978), Darroch, Lauritzen, & Speed (1980), and Wermuth & Lauritzen (1983)

In graph theory, a graph G = 〈 V, E 〉, where V is a set of vertices and E is a set of edges
The members of E are pairs of vertices: these pairs are ordered in a directed graph and unordered in an undirected graph
The directed edge A → B is represented by the ordered pair 〈 A, B 〉
A path is any consecutive sequence of edges, regardless of directionality
For a directed path from A to B, A is the source of the path and B is the sink of the path
If there is a directed edge from A to B, then:

A is the parent of B
B is the child of A

Parents(V) denotes the set of all parents of vertices in V
Children(V) denotes the set of all children of vertices in V
The indegree of a vertex V₁ is the number of parents of V₁
The outdegree of a vertex V₁ is the number of children of V₁
The degree of a vertex V₁ is the number of vertices adjacent to V₁
The ancestor of a vertex V₁ is any vertex V_i such that there is a directed path from V_i to V₁
The descendant of a vertex V₁ is any vertex V_i such that there is a directed path from V₁ to V_i

Relative to GRAPH 1:
The parents of X are A and B
The ancestors of X are D, C, A, and B
The children of X are Y and Z
The descendants of X are Y, Z, F, G, and H
The indegree of X is 2
The outdegree of X is 2
The degree of X is 4

Types of Paths
Path	Description
	Direct causality A ⫫ B
	Indirect causality A ⫫ B A ⫫ B\|C There may be overcontrol bias when one intercepts or blocks a causal pathway
	Fork with C as the common cause or confounder A ⫫ B A ⫫ B\|C There may be confounding bias when one fails to condition on a common cause or confounder
	Inverted fork with C as the common effect or collider A ⫫ B A ⫫ B\|C There may be endogenous selection bias when one mistakenly conditions on a common effect or collider

D-separation

The rules for d-separation (Pearl, 1988):
RULE 1 for d-separation ('d' stands for 'directional'): X and Y are d-connected if there is an unblocked path between X and Y
An unblocked path is a path that can be traced without meeting with any colliders

In EXAMPLE 1:
There is a collider at T
The path X-R-S-T is unblocked
∴ X and R, X and S, X and T, R and S, R and T, and S and T are d-connected
The path T-U-V-Y is also unblocked
∴ T and U, T and V, T and Y, U and V, U and Y, and V and Y are also d-connected
However, X and Y, X and V, S and U, etc are d-separated: no path can be traced between them without meeting the collider at T

RULE 2 for d-separation: X and Y are d-connected, conditioned on a set Z of nodes, if there is a collider-free path between X and Y that does not traverse any members of Z

In EXAMPLE 2:
Let Z be the set { R, V }
According to RULE 2:
X and S are d-separated: the path X-R-S is blocked by Z
U and Y are d-separated: the path U-V-Y is blocked by Z
S and U are d-separated: the path S-T-U is not collider-free
Although T is not in Z, the path S-T-U is still blocked since T is a collider (RULE 1)

RULE 3 for d-separation: If a collider is a member of the conditioning set Z or the collider has a descendant in Z, then it no longer blocks any path that traces this collider

In EXAMPLE 3:
Let Z be the set { T }
According to RULE 3:
X and Y, X and V, S and U, etc are now d-connected, since the collider T is a member of the conditioning set Z (compare with EXAMPLE 1)

In EXAMPLE 4:
Let Z be the set { S₁, S₂ }
According to RULE 3:
The collider at U has a descendant S₁ in the conditioning set Z and the link at U is unblocked
The collider at W has a descendant S₂ in the conditioning set Z and the link at W is unblocked
∴ We now have a collider-free path between X and Y (viz. X-U-V-W-Y) that does not traverse any members of the conditioning set Z

There are 3 blocking criteria:

CRITERION 1: Conditioning on a non-collider blocks a path
CRITERION 2: Conditioning on a collider or a descendant of a collider unblocks a path
CRITERION 3: Not conditioning on a collider leaves a path naturally blocked

A d-separated (or blocked) path does not transmit association
A d-connected (or unblocked) path may transmit association

DAGs & Bayesian Networks

The Pearl Causal Model takes a set of data and produces a directed acyclic graph (DAG)
A DAG is:

Directed — Each directed edge is single-headed, expressing a causal statement
'A → B' denotes that A directly causes B
Acyclic — There are no directed cycles
'A → B → C → A' is impossible
Graphical — DAGs employ the mathematical structure of graphs:

Vertices or nodes stand for variables;
Directed edges or arrows stand for possible direct causal effects;
Missing arrows encode assumptions about absent direct causal effects

A DAG is also called a Bayesian network (BN)
The DAG represents the causal structure of the system
Conditional dependence relations between variables are represented as edges between vertices in the graph
The main idea behind the Pearl Causal Model is to find a graph or set of graphs that best explain the data

There are two primary METHODS for inferring BNs from the data:
METHOD 1: Assign scores to graphs and search over the set of possible graphs, while attempting to maximize a particular scoring function
An initial graph is generated in METHOD 1 and the search space is explored by altering this graph

METHOD 2: Start with an undirected, fully connected graph and use repeated conditional independence tests to remove and orient edges in the graph
After the edges are removed, the remaining ones are directed from cause to effect

Example of a DBN

Image source: Wikipedia

Dynamic Bayesian networks (DBNs) use BNs to show how variables influence each other across time (Friedman et al, 1998, Murphy, 2002)

STEP 1: There is generally an initial Bayesian network (BN) depicting connections between variables at some time t;
STEP 2: There is then a set of BNs showing the system at t + 1, t + 2, etc

Axioms

AXIOM 1: Causal Markov Condition (CMC)
A directed acyclic graph (DAG) G over V and a probability distribution P(V) satisfies the CMC iff for every W in V, W is independent of its non-effects, given its parents

(Alternatively) Given any disjoint sets A, B, and C of variables, if A is d-separated from B conditional on C, then A is statistically independent of B given C
See Pearl (1988) for proof

Formally:
W ⫫ { V \ Descendants(W) ∪ Parents(W) }|Parents(W)

In EXAMPLE 5:
The CMC entails the following conditional independence relations:
A ⫫ B
D ⫫ { A, B }|C

AXIOM 2: Causal Faithfulness Condition (CFC)
Given any graph, the CMC (or AXIOM 1) determines a set of independence relations
However, a probability distribution P on a graph G that satisfies the CMC may include other independence relations besides those entailed by the CMC

Recall the birth control pill EXAMPLE (Hesslow, 1976, Cartwright, 1989):

BCP (birth control pills) and T (thrombosis) might be independent in the probability distribution satisfying the CMC for this graph, even though the graph does not entail their independence

∴ If all and only those conditional independence relations true in P are entailed by the CMC applied to G, then we say that P and G are faithful to one another
(Alternatively) Given any disjoint sets A, B, and C of variables, if A is statistically independent of B given C, then A is d-separated from B conditional on C

AXIOM 3: Causal Sufficiency
A set V of variables is causally sufficient for a population iff in the population every common cause of any 2 or more variables in V is in V
According to AXIOM 3, there are no hidden common causes

Intervention & the Do-calculus

EXAMPLE:
Here is the DAG for the relationship between temperature, ice cream sales, and crime rates:

X denotes ice cream sales
Y denotes crime rates
Z denotes the temperature
U_X, U_Y, and U_Z denote the error terms for X, Y, and Z (i.e. the effects of exogenous variables not included in the causal model

Suppose that we intervene on the value of ice cream sales (X)
We might fix the value of X to a low value (e.g. by shutting down all ice cream shops)

When we intervene on X:

We fix the value of X to a certain value (i.e. X = x);
We curtail the natural tendency of X to vary in response to the other variables Y and Z
∴ The intervention I breaks all other directed edges or arrows previously directed into X (viz. Z → X, U_X → X)
These directed edges are replaced by a single directed edge from the intervention variable I to X
Y is now totally uncorrelated with X, since X is no longer associated with Z

The removal of all edges directed into X has been described as 'surgery'
The tacit assumption here is that intervention or surgery has no side effects
This has been described as the arrow-breaking conception of interventions (Pearl, 2000, Spirtes et al, 2000)

According to the do-calculus:
Given 2 disjoint sets of variables X and Y, the causal effect of X on Y is a function from X to the space of probability distributions on Y
The causal effect of X on Y is denoted as 'P(y | do(x))' in the do-calculus
For each realization x of X, P(y | do(x)) gives the probability of Y = y induced by deleting from the model all equations corresponding to variables in X and substituting X = x in the remaining equations The graph corresponding to the reduced set of equations is the subgraph from which all directed edges entering X have been pruned by surgery

Where x₁, …, x_n denote variables in a BN:
P(x₁, …, x_n) = ∏ P(x_i|Parents(x_i)

According to Cochrane's EXAMPLE (Wainer, 1989):
Soil fumigants (X) are being used to increase oat crop yields (Y) by controlling the eelworm population (Z)

X denotes the soil fumigants
Y denotes the oat crop yields
Z denotes the eelworm population: Z₀ denotes last year's eelworm population, Z₁ denotes the quantity of eelworm population before treatment, Z₂ denotes the quantity of eelworm population after treatment, and Z₃ denotes the quantity of eelworm population at the end of the season
B denotes the population of birds and other predators

From P(x₁, …, x_n) = ∏ P(x_i|Parents(x_i), we get:
P(z₀, x, z₁, b, z₂, z₃, y) = P(z₀)P(x|z₀)P(z₁|z₀)P(b|z₀) × P(z₂|x, z₁)P(z₃|z₂, b)P(y|x, z₂, z₃)

With the intervention do(X = x′):
P(z₀, z₁, b, z₂, z₃, y|do(X = x′)) = P(z₀)P(z₁|z₀)P(b|z₀) × P(z₂|x′, z₁)P(z₃|z₂, b)P(y|x′, z₂, z₃)

Pearl Causal Model

Graph Theory

Types of Paths

Path

Description

D-separation

DAGs & Bayesian Networks

Axioms

Intervention & the Do-calculus

Background image taken from: https://cdn.asiatatler.com/asiatatler/i/th/2020/02/04101225-aurora-1185464-1920_cover_1920x1280.jpg This website has been coded using html, css, and js and is dedicated to B and H .