Statistical Rethinking Chapter 6 EoC
Easy Questions
6E1 List three mechanisms by which multiple regression can produce false inferences about causal effects.
- Multicollinearity
- Post treatment bias
- Collider bias
6E2 For one of the mechanisms in the previous problem, provide an example of your choice, perhaps from your own research.
In the case of predicting defaults for home loans, one of the info used is the original loan amount taken and another is the current loan amount (original - amount already paid). These two are highly correlated, and including them both can lead to erroroneous inferences.
6E3 List the four elemental confounds. Can you explain the conditional dependencies of each?
- Fork: X <- Z -> Y, a common var Z influences both X and Y, causing correlation between them.
- Pipe: X -> Z -> Y, the real influence between X and Y could be hidden due to Z through which X influences Y. Z could be a post treatment var.
- Collider: X -> Z <- Y: both X and Y influence Z, thus conditioning on Z can lead to statistical association between X and Y even when in reality they are independent of each other.
- Descendant: Same as collider, but Z has a child D that it influences. Conditioning on this child D and lead to similar results as in above.
6E4 How is a biased sample like conditioning on a collider? Think of the example at the open of the chapter.
\[ X \rightarrow Z \leftarrow Y \] Z is collider here. If we condition on Z, then knowing X can tell us about Y, for instance in the newsworthiness vs trustworthiness example, if we condition on Selection (only sample from selected), then if N is low implies T must have been high for it to be selected and vice versa.
Medium Questions
6M1 Modify the DAG on page 186 to include the variable V, an unobserved cause of C and Y: C <- V -> Y. Reanalyze the DAG. How many paths connect X to Y? Which must be closed? Which variables should you condition on now?
library(dagitty)
library(ggdag)
## Loading required package: ggplot2
##
## Attaching package: 'ggdag'
## The following object is masked from 'package:stats':
##
## filter
library(rethinking)
## Loading required package: rstan
## Loading required package: StanHeaders
## rstan (Version 2.21.2, GitRev: 2e1f913d3ca3)
## For execution on a local, multicore CPU with excess RAM we recommend calling
## options(mc.cores = parallel::detectCores()).
## To avoid recompilation of unchanged Stan programs, we recommend calling
## rstan_options(auto_write = TRUE)
## Do not specify '-march=native' in 'LOCAL_CPPFLAGS' or a Makevars file
## Loading required package: parallel
## rethinking (Version 2.13)
##
## Attaching package: 'rethinking'
## The following object is masked from 'package:stats':
##
## rstudent
<- dagitty( "dag {
dag_6M1 U [unobserved]
V [unobserved]
X -> Y
X <- U <- A -> C -> Y
U -> B <- C
C <- V -> Y
}")
coordinates(dag_6M1) <- list(
x = c(X = 0, Y = 2, U = 0, B = 1 , C = 2, A = 1 , V = 3),
y = c(X = 0, Y = 0, U = 1, B = 0.7, C = 1, A = 1.3, V = 1)
)# ggdag(dag_6M1) + theme_void()
drawdag(dag_6M1)
Paths here ->
- Y <- C <- A -> U -> X (Fork + two pipes) (same as before) (needs to be closed)
- X <- U -> B <- C -> Y (B is collider) (same as before) (already closed)
- X <- U <- A -> C <- V -> Y (C is collider) (C is thus closed now)
Thus we can condition on A.
adjustmentSets(dag_6M1, exposure = "X", outcome = "Y")
## { A }
6M2 Sometimes, in order to avoid multicollinearity, people inspect pairwise correlations among predictors before including them in a model. This is a bad procedure, because what matters is the conditional association, not the association before the variables are included in the model. To highlight this, consider the DAG X → Z → Y. Simulate data from this DAG so that the correlation between X and Z is very large. Then include both in a model prediction Y. Do you observe any multicollinearity? Why or why not? What is different from the legs example in the chapter?
# X -> Z -> Y
set.seed(62)
<- rnorm(100, 3, 1)
x <- 2 * x + rnorm(100)
z <- 3 * z + rnorm(100, 2, 3)
y
<- data.frame(x, y ,z)
d
<- quap(
m62 alist(
~ dnorm(mu, sigma),
y <- a + bX * x + bZ * z,
mu ~ dnorm(0, 7),
a ~ dnorm(0, 7),
bX ~ dnorm(0, 7),
bZ ~ dexp(3)
sigma data = d
),
)precis(m62)
## mean sd 5.5% 94.5%
## a 3.4369992 0.8625566 2.058467 4.8155313
## bX -0.6552322 0.5962388 -1.608137 0.2976725
## bZ 3.0733699 0.2589212 2.659564 3.4871759
## sigma 2.7189942 0.1814884 2.428941 3.0090477
X comes out to be non-significant. This is as we would expect in this situations - a pipe. after conditioning on Z, we don’t learn anything new about Y from X. In the case of legs example, both legs were directly influencing final outcome (height).
6M3 Learning to analyze DAGs requires practice. For each of the four DAGs below, state which variables, if any, you must adjust for (condition on) to estimate the total causal influence of X on Y.
- First
<- dagitty("dag{
dag1 Z <- A -> Y
X <- Z -> Y
X -> Y
}")
coordinates(dag1) <- list(
x = c(Z = 1, A = 2, X = 0, Y = 2),
y = c(Z = 1, A = 1, X = 0, Y = 0)
)
ggdag(dag1, layout = 'circle') + theme_void()
Paths: - X <- Z -> Y (fork, open, condition on Z) - X <- Z <- A -> Y (fork + pipe, open, condition on Z)
adjustmentSets(dag1, exposure = "X", outcome = "Y")
## { Z }
- Second Dag
<- dagitty("dag{
dag2 Z <- A -> Y
X -> Z -> Y
X -> Y
}")
coordinates(dag2) <- list(
x = c(Z = 1, A = 2, X = 0, Y = 2),
y = c(Z = 1, A = 1, X = 0, Y = 0)
)
ggdag(dag2, layout = 'circle') + theme_void()
Paths: - Y <- X -> Z -> Y (no backdoor on X) - X -> Z <- A -> Y (collider at Z, fork A) - including Z will show association between X and A, and thereby X and Y. So don’t condition on Z.
adjustmentSets(dag2, exposure = "X", outcome = "Y")
## {}
- Third DAG
<- dagitty("dag{
dag3 Z <- A -> X
X -> Z <- Y
X -> Y
}")
coordinates(dag3) <- list(
x = c(Z = 1, A = 0, X = 0, Y = 2),
y = c(Z = 1, A = 1, X = 0, Y = 0)
)
ggdag(dag3, layout = 'circle') + theme_void()
Paths: - X -> Z <- Y (collider at Z, so don’t include Z)
adjustmentSets(dag3, "X", "Y")
## {}
- Fourth DAG
<- dagitty("dag{
dag4 Z <- A -> X
X -> Z -> Y
X -> Y
}")
coordinates(dag4) <- list(
x = c(Z = 1, A = 0, X = 0, Y = 2),
y = c(Z = 1, A = 1, X = 0, Y = 0)
)
ggdag(dag4, layout = 'circle') + theme_void()
Paths: - X -> Z -> Y (pipe) (no backdoor to X) - X <- A -> Z -> Y (A forks to X and Z and A pipes to Y, backdoor into X, needs to be closed, condition on A)
adjustmentSets(dag4, "X", "Y")
## { A }
Hard Questions
6H1 Use the Waffle House data, data(WaffleDivorce), to find the total causal influence of number of Waffle Houses on divorce rate. Justify your model or models with a causal graph.
data("WaffleDivorce")
<- WaffleDivorce
d
head(d)
## Location Loc Population MedianAgeMarriage Marriage Marriage.SE Divorce
## 1 Alabama AL 4.78 25.3 20.2 1.27 12.7
## 2 Alaska AK 0.71 25.2 26.0 2.93 12.5
## 3 Arizona AZ 6.33 25.8 20.3 0.98 10.8
## 4 Arkansas AR 2.92 24.3 26.4 1.70 13.5
## 5 California CA 37.25 26.8 19.1 0.39 8.0
## 6 Colorado CO 5.03 25.7 23.5 1.24 11.6
## Divorce.SE WaffleHouses South Slaves1860 Population1860 PropSlaves1860
## 1 0.79 128 1 435080 964201 0.45
## 2 2.05 0 0 0 0 0.00
## 3 0.74 18 0 0 0 0.00
## 4 1.22 41 1 111115 435450 0.26
## 5 0.24 0 0 0 379994 0.00
## 6 0.94 11 0 0 34277 0.00
Exposure is WaffleHouses, Outcome is Divorce rate. No of waffle houses would be influenced by the state population, and also by the categorical variable - Southern State, since that is where WaffleHouses started. Moreover, Divorce could be impacted by Median age at marriage and marriage rate as already discussed in earlier chapters, which could likely be influenced by S.
Let’s say the causal chain is like the following:
<- dagitty("dag{
dagD W -> D
S -> W
M -> D
A -> D
A -> M
S -> M
S -> A
}")
ggdag(dagD, layout = 'circle') + theme_void()
Paths: - W <- S -> M -> D (backdoor, open, condition on S) - W <- S -> A -> D (backdoor, open, condition on S) - W <- S -> M <- A -> D (collider at M, S and A are fork)
We could condition on S, or both A, M.
adjustmentSets(dagD, "W", "D")
## { A, M }
## { S }
Thus building two models, one with W, D and S and another wtih W, D, A and M.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# first standardise all parameters of interest
<- d %>%
d2 rename(W = WaffleHouses,
D = Divorce,
A = MedianAgeMarriage,
M = Marriage,
S = South) %>%
mutate(across(c(W, A, M, D), standardize),
S = S + 1)
# model with W, D and S
<- quap(
mh1 alist(
~ dnorm(mu, sigma),
D <- a[S] + bW*W,
mu ~ dnorm(0, 2),
a[S] ~ dnorm(0, 2),
bW ~ dexp(1)
sigma data = d2
),
)plot(precis(mh1, 2))
W’s influence comes to be zero.
# model with W, D, A and M
<- quap(
mh2 alist(
~ dnorm(mu, sigma),
D <- a[S] + bW * W + bA*A + bM * M,
mu ~ dnorm(0, 2),
a[S] c(bA, bM, bW) ~ dnorm(0, 2),
~ dexp(1)
sigma data = d2
),
)plot(precis(mh2, 2))
bW is still zero in this model as well.
6H2 Build a series of models to test the implied conditional independencies of the causal graph you used in the previous problem. If any of the tests fail, how do you think the graph needs to be amended? Does the graph need more or fewer arrows? Feel free to nominate variables that aren’t in the data.
impliedConditionalIndependencies(dagD)
## A _||_ W | S
## D _||_ S | A, M, W
## M _||_ W | S
- \(A \perp \!\!\! \perp W | S\)
<- quap(
h1 alist(
~ dnorm(mu, sigma),
W <- a + bS * S + bA * A,
mu ~ dnorm(0, 2),
a c(bS, bA) ~ dnorm(0, 1),
~ dexp(1)
sigma data = d2
),
)precis(h1)
## mean sd 5.5% 94.5%
## a -1.83063343 0.30437276 -2.3170799 -1.3441870
## bS 1.43382208 0.22484029 1.0744839 1.7931603
## bA 0.04261305 0.10461217 -0.1245774 0.2098035
## sigma 0.71468869 0.07100323 0.6012118 0.8281656
bA comes out to be zero after conditioning on S, so we meet the condition.
Second Condition
- \(D \perp \!\!\! \perp S | A, M, W\)
<- quap(
h2 alist(
~ dnorm(mu, sigma),
S <- a + bD*D + bA*A + bM * M + bW*W,
mu ~ dnorm(0, 2),
a c(bD, bA, bM, bW) ~ dnorm(0, 1),
~ dexp(1)
sigma data = d2
),
)precis(h2)
## mean sd 5.5% 94.5%
## a 1.27937728 0.04411680 1.20887012 1.34988445
## bD 0.05189830 0.05719195 -0.03950548 0.14330208
## bA -0.08331033 0.07449230 -0.20236341 0.03574274
## bM -0.05010881 0.06458959 -0.15333544 0.05311783
## bW 0.28893445 0.04617458 0.21513855 0.36273034
## sigma 0.31202758 0.03105690 0.26239265 0.36166251
bD comes out to be zero after conditioning on A, M and W.
- \(M \perp \!\!\! \perp W | S\)
<- quap(
h3 alist(
~ dnorm(mu, sigma),
W <- a + bM*M + bS * S,
mu ~ dnorm(0, 2),
a c(bS, bM) ~ dnorm(0, 1),
~ dexp(1)
sigma data = d2
),
)precis(h3)
## mean sd 5.5% 94.5%
## a -1.80777287 0.29779158 -2.2837013 -1.3318444
## bS 1.41594111 0.21932418 1.0654187 1.7664635
## bM -0.02514286 0.10206379 -0.1882605 0.1379748
## sigma 0.71593226 0.07110397 0.6022944 0.8295701
bM is zero, so we meet the condition.
Problems on foxes dataset, - 116 foxes from - 30 different urban groups in England (2-8 foxes / group) -
area
signifies territory of each fox group - some territories have moreavgfood
- need to modelweight
of each fox
Assume Following DAG
6H3 Use a model to infer the total causal influence of area on weight. Would increasing the area available to each fox make it heavier (healthier)? You might want to standardize the variables. Regardless, use prior predictive simulation to show that your model’s prior predictions stay within the possible outcome range.
data(foxes)
<- foxes %>%
d mutate(across( c(avgfood:weight), standardize)) # not standardising groupsize
<- quap(
mh3 alist(
~ dnorm(mu, sigma),
weight <- a + bA * area,
mu ~ dnorm(0, 1),
a ~ dnorm(0, 1),
bA ~ dexp(1)
sigma data = d
),
)plot(precis(mh3))
Size of area has no influence on weight of foxes.
Prior predictive simulation
<- seq(-3, 3, length.out = 30) # simulated Area values
area <- extract.prior(mh3) # extract prior values
priors <- link(mh3, data = list(area = area), post = priors) # get weight values for priors
mu
# get them back on original scale
<- area * attributes(d$area)[[2]] + attributes(d$area)[[1]]
A <- apply(mu, 1:2, function(x) x * attributes(d$weight)[[2]] + attributes(d$weight)[[1]])
W
# Plot the result
plot( NULL , xlim = range(A) , ylim = range(W) , xlab = "area" , ylab = "weight")
abline(h = 0, col = 'red', lwd = 2) # weight cannot be negative
abline(h = 10, col = 'red', lwd = 2) # usually foxes weight till 9.5 kg according to google info box
for(i in 1:30){
# draw regression lines for each prior
lines(A, W[i,], col = col.alpha('black', 0.3))
}
The priors seems good enough at keeping results within the boundary of possible values.
6H4 Now infer the causal impact of adding food to a territory. Would this make foxes heavier? Which covariates do you need to adjust for to estimate the total causal influence of food?
Paths - - avgfood -> Weight (direct path) - avgfood -> groupsize -> weight (pipe, adding groupsize would block the path)
adjustmentSets(foxdag, "avgfood", 'weight')
## {}
So no need to add anything.
<- quap(
mh4 alist(
~ dnorm(mu, sigma),
weight <- a + bA * avgfood,
mu ~ dnorm(0, 1),
a ~ dnorm(0, 1),
bA ~ dexp(1)
sigma data = d
),
)plot(precis(mh4))
Similar to area, there is no effect on weight of foxes if we add food to a territory.
6H5 Now infer the causal impact of group size. Which covariates do you need to adjust for? Looking at the posterior distribution of the resulting model, what do you think explains these data? That is, can you explain the estimates for all three problems? How do they go together?
Paths - - groupsize -> weight (direct path) - weight <- groupsize <- avgfood -> weight (fork from avgfood, this is backdoor into groupsize, to block it we need to include avgfood)
adjustmentSets(foxdag, "groupsize", 'weight')
## { avgfood }
<- quap(
mh5 alist(
~ dnorm(mu, sigma),
weight <- a + bA * avgfood + bG* groupsize,
mu ~ dnorm(0, 1),
a c(bA, bG) ~ dnorm(0, 1),
~ dexp(1)
sigma data = d
),
)plot(precis(mh5))
Increasing groupsize reduce health of foxes, and increasing avgfood increases it according to this model now. Greater quantity of food increases health is self-explanatory, large quantities of food probably attract more foxes as well leading to increased size, but greater group size leads to reduced health since the same quantity of food now needs to be divided among more foxes.
groupsize and avgfood work in opposite direction, so it is only when we include both that we get their real effect. In 6H4, by having only avgfood as predictor, the negative effect of increasing groupsize (due to increasing avgfood) lead to an overall zero effect on health of foxes. The two tend to mask each other’s effect.