Statistical Rethinking Chapter 6 EoC

Easy Questions

6E1 List three mechanisms by which multiple regression can produce false inferences about causal effects.

Multicollinearity
Post treatment bias
Collider bias

6E2 For one of the mechanisms in the previous problem, provide an example of your choice, perhaps from your own research.

In the case of predicting defaults for home loans, one of the info used is the original loan amount taken and another is the current loan amount (original - amount already paid). These two are highly correlated, and including them both can lead to erroroneous inferences.

6E3 List the four elemental confounds. Can you explain the conditional dependencies of each?

Fork: X <- Z -> Y, a common var Z influences both X and Y, causing correlation between them.
Pipe: X -> Z -> Y, the real influence between X and Y could be hidden due to Z through which X influences Y. Z could be a post treatment var.
Collider: X -> Z <- Y: both X and Y influence Z, thus conditioning on Z can lead to statistical association between X and Y even when in reality they are independent of each other.
Descendant: Same as collider, but Z has a child D that it influences. Conditioning on this child D and lead to similar results as in above.

6E4 How is a biased sample like conditioning on a collider? Think of the example at the open of the chapter.

\[ X \rightarrow Z \leftarrow Y \] Z is collider here. If we condition on Z, then knowing X can tell us about Y, for instance in the newsworthiness vs trustworthiness example, if we condition on Selection (only sample from selected), then if N is low implies T must have been high for it to be selected and vice versa.

Medium Questions

6M1 Modify the DAG on page 186 to include the variable V, an unobserved cause of C and Y: C <- V -> Y. Reanalyze the DAG. How many paths connect X to Y? Which must be closed? Which variables should you condition on now?

library(dagitty) 
library(ggdag)

## Loading required package: ggplot2

## 
## Attaching package: 'ggdag'

## The following object is masked from 'package:stats':
## 
##     filter

library(rethinking)

## Loading required package: rstan

## Loading required package: StanHeaders

## rstan (Version 2.21.2, GitRev: 2e1f913d3ca3)

## For execution on a local, multicore CPU with excess RAM we recommend calling
## options(mc.cores = parallel::detectCores()).
## To avoid recompilation of unchanged Stan programs, we recommend calling
## rstan_options(auto_write = TRUE)

## Do not specify '-march=native' in 'LOCAL_CPPFLAGS' or a Makevars file

## Loading required package: parallel

## rethinking (Version 2.13)

## 
## Attaching package: 'rethinking'

## The following object is masked from 'package:stats':
## 
##     rstudent

dag_6M1 <- dagitty( "dag {
                    U [unobserved] 
                    V [unobserved]
                    X -> Y 
                    X <- U <- A -> C -> Y 
                    U -> B <- C
                    C <- V -> Y
                    }")
coordinates(dag_6M1) <- list(
  x = c(X = 0, Y = 2, U = 0, B = 1  , C = 2, A = 1  , V = 3),
  y = c(X = 0, Y = 0, U = 1, B = 0.7, C = 1, A = 1.3, V = 1)
)
# ggdag(dag_6M1) + theme_void()
drawdag(dag_6M1)

Paths here ->

Y <- C <- A -> U -> X (Fork + two pipes) (same as before) (needs to be closed)
X <- U -> B <- C -> Y (B is collider) (same as before) (already closed)
X <- U <- A -> C <- V -> Y (C is collider) (C is thus closed now)

Thus we can condition on A.

adjustmentSets(dag_6M1, exposure = "X", outcome = "Y")

## { A }

6M2 Sometimes, in order to avoid multicollinearity, people inspect pairwise correlations among predictors before including them in a model. This is a bad procedure, because what matters is the conditional association, not the association before the variables are included in the model. To highlight this, consider the DAG X → Z → Y. Simulate data from this DAG so that the correlation between X and Z is very large. Then include both in a model prediction Y. Do you observe any multicollinearity? Why or why not? What is different from the legs example in the chapter?

# X -> Z -> Y

set.seed(62)
x <- rnorm(100, 3, 1)
z <- 2 * x + rnorm(100)
y <- 3 * z + rnorm(100, 2, 3)

d <- data.frame(x, y ,z)

m62 <- quap(
    alist(
        y   ~ dnorm(mu, sigma), 
        mu  <- a + bX * x + bZ * z, 
        a   ~ dnorm(0, 7), 
        bX  ~ dnorm(0, 7), 
        bZ  ~ dnorm(0, 7), 
        sigma ~ dexp(3)
    ), data = d
)
precis(m62)

##             mean        sd      5.5%     94.5%
## a      3.4369992 0.8625566  2.058467 4.8155313
## bX    -0.6552322 0.5962388 -1.608137 0.2976725
## bZ     3.0733699 0.2589212  2.659564 3.4871759
## sigma  2.7189942 0.1814884  2.428941 3.0090477

X comes out to be non-significant. This is as we would expect in this situations - a pipe. after conditioning on Z, we don’t learn anything new about Y from X. In the case of legs example, both legs were directly influencing final outcome (height).

6M3 Learning to analyze DAGs requires practice. For each of the four DAGs below, state which variables, if any, you must adjust for (condition on) to estimate the total causal influence of X on Y.

First

dag1 <- dagitty("dag{
                 Z <- A -> Y
                 X <- Z -> Y
                 X -> Y
}")

coordinates(dag1) <- list(
    x = c(Z =  1, A = 2, X = 0, Y = 2), 
    y = c(Z =  1, A = 1, X = 0, Y = 0)
)

ggdag(dag1, layout = 'circle') + theme_void()

Paths: - X <- Z -> Y (fork, open, condition on Z) - X <- Z <- A -> Y (fork + pipe, open, condition on Z)

adjustmentSets(dag1, exposure = "X", outcome = "Y")

## { Z }

Second Dag

dag2 <- dagitty("dag{
                 Z <- A -> Y
                 X -> Z -> Y
                 X -> Y
}")

coordinates(dag2) <- list(
    x = c(Z =  1, A = 2, X = 0, Y = 2), 
    y = c(Z =  1, A = 1, X = 0, Y = 0)
)

ggdag(dag2, layout = 'circle') + theme_void()

Paths: - Y <- X -> Z -> Y (no backdoor on X) - X -> Z <- A -> Y (collider at Z, fork A) - including Z will show association between X and A, and thereby X and Y. So don’t condition on Z.

adjustmentSets(dag2, exposure = "X", outcome = "Y")

##  {}

Third DAG

dag3 <- dagitty("dag{
                 Z <- A -> X
                 X -> Z <- Y
                 X -> Y
}")

coordinates(dag3) <- list(
    x = c(Z =  1, A = 0, X = 0, Y = 2), 
    y = c(Z =  1, A = 1, X = 0, Y = 0)
)

ggdag(dag3, layout = 'circle') + theme_void()

Paths: - X -> Z <- Y (collider at Z, so don’t include Z)

adjustmentSets(dag3, "X", "Y")

##  {}

Fourth DAG

dag4 <- dagitty("dag{
                 Z <- A -> X
                 X -> Z -> Y
                 X -> Y
}")

coordinates(dag4) <- list(
    x = c(Z =  1, A = 0, X = 0, Y = 2), 
    y = c(Z =  1, A = 1, X = 0, Y = 0)
)

ggdag(dag4, layout = 'circle') + theme_void()

Paths: - X -> Z -> Y (pipe) (no backdoor to X) - X <- A -> Z -> Y (A forks to X and Z and A pipes to Y, backdoor into X, needs to be closed, condition on A)

adjustmentSets(dag4, "X", "Y")

## { A }

Hard Questions

6H1 Use the Waffle House data, data(WaffleDivorce), to find the total causal influence of number of Waffle Houses on divorce rate. Justify your model or models with a causal graph.

data("WaffleDivorce")
d <- WaffleDivorce

head(d)

##     Location Loc Population MedianAgeMarriage Marriage Marriage.SE Divorce
## 1    Alabama  AL       4.78              25.3     20.2        1.27    12.7
## 2     Alaska  AK       0.71              25.2     26.0        2.93    12.5
## 3    Arizona  AZ       6.33              25.8     20.3        0.98    10.8
## 4   Arkansas  AR       2.92              24.3     26.4        1.70    13.5
## 5 California  CA      37.25              26.8     19.1        0.39     8.0
## 6   Colorado  CO       5.03              25.7     23.5        1.24    11.6
##   Divorce.SE WaffleHouses South Slaves1860 Population1860 PropSlaves1860
## 1       0.79          128     1     435080         964201           0.45
## 2       2.05            0     0          0              0           0.00
## 3       0.74           18     0          0              0           0.00
## 4       1.22           41     1     111115         435450           0.26
## 5       0.24            0     0          0         379994           0.00
## 6       0.94           11     0          0          34277           0.00

Exposure is WaffleHouses, Outcome is Divorce rate. No of waffle houses would be influenced by the state population, and also by the categorical variable - Southern State, since that is where WaffleHouses started. Moreover, Divorce could be impacted by Median age at marriage and marriage rate as already discussed in earlier chapters, which could likely be influenced by S.

Let’s say the causal chain is like the following:

dagD <- dagitty("dag{
                W -> D
                S -> W
                M -> D
                A -> D
                A -> M
                S -> M
                S -> A
}")
ggdag(dagD, layout = 'circle') + theme_void()

Paths: - W <- S -> M -> D (backdoor, open, condition on S) - W <- S -> A -> D (backdoor, open, condition on S) - W <- S -> M <- A -> D (collider at M, S and A are fork)

We could condition on S, or both A, M.

adjustmentSets(dagD, "W", "D")

## { A, M }
## { S }

Thus building two models, one with W, D and S and another wtih W, D, A and M.

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

# first standardise all parameters of interest
d2 <- d %>% 
    rename(W = WaffleHouses, 
           D = Divorce, 
           A = MedianAgeMarriage, 
           M = Marriage, 
           S = South) %>% 
    mutate(across(c(W, A, M, D), standardize), 
           S = S + 1)

# model with W, D and S
mh1 <- quap(
    alist(
        D ~ dnorm(mu, sigma), 
        mu <- a[S] + bW*W,
        a[S] ~ dnorm(0, 2), 
        bW ~ dnorm(0, 2), 
        sigma ~ dexp(1)
    ), data = d2
)
plot(precis(mh1, 2))

W’s influence comes to be zero.

# model with W, D, A and M
mh2 <- quap(
    alist(
        D ~ dnorm(mu, sigma), 
        mu <- a[S] + bW * W + bA*A + bM * M,
        a[S] ~ dnorm(0, 2), 
        c(bA, bM, bW) ~ dnorm(0, 2), 
        sigma ~ dexp(1)
    ), data = d2
)
plot(precis(mh2, 2))

bW is still zero in this model as well.

6H2 Build a series of models to test the implied conditional independencies of the causal graph you used in the previous problem. If any of the tests fail, how do you think the graph needs to be amended? Does the graph need more or fewer arrows? Feel free to nominate variables that aren’t in the data.

impliedConditionalIndependencies(dagD)

## A _||_ W | S
## D _||_ S | A, M, W
## M _||_ W | S

\(A \perp \!\!\! \perp W | S\)

h1 <- quap(
    alist(
        W ~ dnorm(mu, sigma), 
        mu <- a + bS * S + bA * A, 
        a ~ dnorm(0, 2), 
        c(bS, bA) ~ dnorm(0, 1), 
        sigma ~ dexp(1)
    ), data = d2
)
precis(h1)

##              mean         sd       5.5%      94.5%
## a     -1.83063343 0.30437276 -2.3170799 -1.3441870
## bS     1.43382208 0.22484029  1.0744839  1.7931603
## bA     0.04261305 0.10461217 -0.1245774  0.2098035
## sigma  0.71468869 0.07100323  0.6012118  0.8281656

bA comes out to be zero after conditioning on S, so we meet the condition.

Second Condition

\(D \perp \!\!\! \perp S | A, M, W\)

h2 <- quap(
    alist(
        S ~ dnorm(mu, sigma), 
        mu <- a + bD*D + bA*A + bM * M + bW*W, 
        a ~ dnorm(0, 2), 
        c(bD, bA, bM, bW) ~ dnorm(0, 1), 
        sigma ~ dexp(1)
    ), data = d2
)
precis(h2)

##              mean         sd        5.5%      94.5%
## a      1.27937728 0.04411680  1.20887012 1.34988445
## bD     0.05189830 0.05719195 -0.03950548 0.14330208
## bA    -0.08331033 0.07449230 -0.20236341 0.03574274
## bM    -0.05010881 0.06458959 -0.15333544 0.05311783
## bW     0.28893445 0.04617458  0.21513855 0.36273034
## sigma  0.31202758 0.03105690  0.26239265 0.36166251

bD comes out to be zero after conditioning on A, M and W.

\(M \perp \!\!\! \perp W | S\)

h3 <- quap(
    alist(
        W ~ dnorm(mu, sigma), 
        mu <- a + bM*M + bS * S, 
        a ~ dnorm(0, 2), 
        c(bS, bM) ~ dnorm(0, 1), 
        sigma ~ dexp(1)
    ), data = d2
)
precis(h3)

##              mean         sd       5.5%      94.5%
## a     -1.80777287 0.29779158 -2.2837013 -1.3318444
## bS     1.41594111 0.21932418  1.0654187  1.7664635
## bM    -0.02514286 0.10206379 -0.1882605  0.1379748
## sigma  0.71593226 0.07110397  0.6022944  0.8295701

bM is zero, so we meet the condition.

Problems on foxes dataset, - 116 foxes from - 30 different urban groups in England (2-8 foxes / group) - area signifies territory of each fox group - some territories have more avgfood - need to model weight of each fox

Assume Following DAG

6H3 Use a model to infer the total causal influence of area on weight. Would increasing the area available to each fox make it heavier (healthier)? You might want to standardize the variables. Regardless, use prior predictive simulation to show that your model’s prior predictions stay within the possible outcome range.

data(foxes)

d <- foxes %>% 
    mutate(across( c(avgfood:weight), standardize)) # not standardising groupsize

mh3 <- quap(
    alist(
        weight ~ dnorm(mu, sigma), 
        mu <- a + bA * area, 
        a  ~ dnorm(0, 1), 
        bA ~ dnorm(0, 1), 
        sigma ~ dexp(1)
    ), data = d
)
plot(precis(mh3))

Size of area has no influence on weight of foxes.

Prior predictive simulation

area <- seq(-3, 3, length.out = 30)                        # simulated Area values
priors <- extract.prior(mh3)                               # extract prior values
mu <- link(mh3, data = list(area = area), post = priors)   # get weight values for priors


# get them back on original scale
A <- area * attributes(d$area)[[2]] + attributes(d$area)[[1]]                              
W <- apply(mu, 1:2, function(x) x * attributes(d$weight)[[2]] + attributes(d$weight)[[1]]) 

# Plot the result

plot( NULL , xlim = range(A) , ylim = range(W) , xlab = "area" , ylab = "weight")
abline(h = 0, col = 'red', lwd = 2)    # weight cannot be negative                        
abline(h = 10, col = 'red', lwd = 2)   # usually foxes weight till 9.5 kg according to google info box
for(i in 1:30){
    # draw regression lines for each prior 
    lines(A, W[i,], col = col.alpha('black', 0.3))
}

The priors seems good enough at keeping results within the boundary of possible values.

6H4 Now infer the causal impact of adding food to a territory. Would this make foxes heavier? Which covariates do you need to adjust for to estimate the total causal influence of food?

Paths - - avgfood -> Weight (direct path) - avgfood -> groupsize -> weight (pipe, adding groupsize would block the path)

adjustmentSets(foxdag, "avgfood", 'weight')

##  {}

So no need to add anything.

mh4 <- quap(
    alist(
        weight ~ dnorm(mu, sigma), 
        mu <- a + bA * avgfood, 
        a  ~ dnorm(0, 1), 
        bA ~ dnorm(0, 1), 
        sigma ~ dexp(1)
    ), data = d
)
plot(precis(mh4))

Similar to area, there is no effect on weight of foxes if we add food to a territory.

6H5 Now infer the causal impact of group size. Which covariates do you need to adjust for? Looking at the posterior distribution of the resulting model, what do you think explains these data? That is, can you explain the estimates for all three problems? How do they go together?

Paths - - groupsize -> weight (direct path) - weight <- groupsize <- avgfood -> weight (fork from avgfood, this is backdoor into groupsize, to block it we need to include avgfood)

adjustmentSets(foxdag, "groupsize", 'weight')

## { avgfood }

mh5 <- quap(
    alist(
        weight ~ dnorm(mu, sigma), 
        mu <- a + bA * avgfood + bG* groupsize, 
        a  ~ dnorm(0, 1), 
        c(bA, bG) ~ dnorm(0, 1), 
        sigma ~ dexp(1)
    ), data = d
)
plot(precis(mh5))

Increasing groupsize reduce health of foxes, and increasing avgfood increases it according to this model now. Greater quantity of food increases health is self-explanatory, large quantities of food probably attract more foxes as well leading to increased size, but greater group size leads to reduced health since the same quantity of food now needs to be divided among more foxes.

groupsize and avgfood work in opposite direction, so it is only when we include both that we get their real effect. In 6H4, by having only avgfood as predictor, the negative effect of increasing groupsize (due to increasing avgfood) lead to an overall zero effect on health of foxes. The two tend to mask each other’s effect.