Rethinking: Chapter 4 EoC Questions
Easy Questions
4E1 In the model definition below, which line is the likelihood?
\[ y_i \sim Normal(\mu, \sigma) \\ \mu \sim Normal(0, 10) \\ \sigma \sim Exponential(1) \]
The first line is the likelihood.
4E2 In the model definition just above, how many parameters are in the posterior distribution?
2 parameters, \(\mu\) and \(\sigma\)
4E3 Using the model definition above, write down the appropriate form of Baye’s theorem that includes the proper likelihood and priors.
\[ Pr(\mu, \sigma | y_i) = \frac{Normal(y_i| \mu, \sigma)\ Uniform(\mu|0, 10) \ Exponential(\sigma, | 1)}{\int Normal(y_i| \mu, \sigma)\ Uniform(\mu|0, 10) \ Exponential(\sigma, | 1)\ d\mu d\sigma} \]
4E4 In the model definition below, which line is the linear model?
\[ \begin{align} y_i &\sim Normal(\mu, \sigma) \\ \mu_i &= \alpha + \beta x_i \\ \alpha &\sim Normal(0, 10) \\ \beta &\sim Normal(0, 1) \\ \sigma &\sim Exponential(2) \end{align} \]
the second line relating \(\mu_i\) with \(\alpha\) and \(\beta\)
4E5 In the model definition above, how many parameters are in the posterior distribution?
3, \(\alpha\), \(\beta\) , and \(\sigma\)
Medium Questions
4M1 For the model definition below, simulate observed y values from the prior (not the posterior)
\[ \begin{align} y_i &\sim Normal(\mu, \sigma) \\ \mu &\sim Normal(0, 10) \\ \sigma &\sim Exponential(1) \end{align} \]
library(rethinking)
## Loading required package: rstan
## Loading required package: StanHeaders
## Loading required package: ggplot2
## rstan (Version 2.21.2, GitRev: 2e1f913d3ca3)
## For execution on a local, multicore CPU with excess RAM we recommend calling
## options(mc.cores = parallel::detectCores()).
## To avoid recompilation of unchanged Stan programs, we recommend calling
## rstan_options(auto_write = TRUE)
## Do not specify '-march=native' in 'LOCAL_CPPFLAGS' or a Makevars file
## Loading required package: parallel
## rethinking (Version 2.13)
##
## Attaching package: 'rethinking'
## The following object is masked from 'package:stats':
##
## rstudent
# prior for mu
curve(dnorm(x, 0, 10), from = -30, to = 30)
# simulating y values from the prior
<- rnorm(1e4, mean = 0, sd = 10)
sample_mu <- rexp(1e4, 1)
sample_sigma <- rnorm(1e4, sample_mu, sample_sigma)
sample_y dens(sample_y)
4M2 Translate the model just above into a quap formula
<- alist(
formula ~ dnorm(mu, sigma),
y ~ dnorm(0, 10),
mu ~ dexp(1)
sigma )
4M3 Translate the quap model formula below into a mathematical model definition.
alist(
~ dnorm(mu, sigma),
y <- a + b*x,
mu ~ dnorm(0, 10),
a ~ dunif(0, 1),
b ~ dexp(1)
sigma )
## [[1]]
## y ~ dnorm(mu, sigma)
##
## [[2]]
## mu <- a + b * x
##
## [[3]]
## a ~ dnorm(0, 10)
##
## [[4]]
## b ~ dunif(0, 1)
##
## [[5]]
## sigma ~ dexp(1)
\[ \begin{align} y &\sim Normal(\mu, \sigma) \\ \mu &= a + bx \\ a &\sim Normal(0, 10) \\ b &\sim Uniform(0, 1) \\ sigma &\sim Exponential(1) \end{align} \]
4M4 A sample of students is measured for height each year for 3 years. After the third year, you want to fit a linear regression predicting height using year as a predictor. Write down the mathematical model definition for this regression, using any variable names and priors you choose. Be prepared to defend your choice of priors.
Students can be both boys and girls and ranging from the age let’s say 6 to 18. From this source, this means that the average height can range from 120cm to 190cm. Moreover, the yearly growth can range from 0 to 7 cm. source2. This growth rate cannot be negative.
The following model considers height is normally distributed and creates a linear relationship with the year the height was measured in. Thus the \(\alpha\), the intercept term, captures the average height of all students, and \(\beta\), the yearly growth rate. The \(\sigma\) is assumed to be uniformly distributed with a very wide interval to capture all the age groups.
\[ \begin{align} height &\sim Normal(\mu, \sigma) \\ \mu &= \alpha + \beta \ t_i \\ \alpha &\sim Normal(155, 20) \\ \beta &\sim \operatorname{Log-Normal} (0,1) \\ \sigma &\sim Uniform(0, 50) \end{align} \]
Checking the prior distributions and simulating the heights using them:
= rnorm(1e4, 155, 20)
samples_alpha = rlnorm(1e4, 0, 1)
samples_beta = runif(1e4, 0, 50)
samples_sigma
# for year 1, 2, 3
= samples_alpha + samples_beta
samples_mu1 = samples_alpha + samples_beta * 2
samples_mu2 = samples_alpha + samples_beta * 3
samples_mu3
# simulated heights for years 1, 2, 3
= rnorm(1e4, samples_mu1, samples_sigma)
simulated_heights1 = rnorm(1e4, samples_mu2, samples_sigma)
simulated_heights2 = rnorm(1e4, samples_mu3, samples_sigma)
simulated_heights3
# check simulated heights
dens(simulated_heights1)
PI(simulated_heights1, 0.97)
## 2% 98%
## 74.46044 241.50898
PI(simulated_heights2, 0.97)
## 2% 98%
## 78.18308 240.34141
PI(simulated_heights3, 0.97)
## 2% 98%
## 76.10535 246.78405
4M5 Now suppose I remind you that every student got taller each year. Does this information lead you to change your choice of priors? How?
Since students got “taller”, we assume positive growth rates only, which we have already taken care of above using log-normal distribution for growth rates.
4M6 Now suppose I tell you that the variance among heights for students of the same age is never more than 64cm. How does this lead you to revise your priors?
Variance of 64 implies a sigma of 8. So we can update our prior for \(\sigma\)
\[ \sigma \sim Uniform(0, 8) \]
4M7 Refit model m4.3 from the chapter, but omit the mean weight xbar this time. Compare the new model’s posterior to that of the original model. In particular, look at the covariance among the parameters. What is different? Then compare the posterior predictions of both models.
data(Howell1);
<- Howell1;
d <- d[ d$age >= 18 , ]
d2
<- mean(d2$weight)
xbar
.3 <- quap(
m4alist(
~ dnorm( mu , sigma ) ,
height <- a + b * ( weight - xbar) ,
mu ~ dnorm( 178 , 20 ) ,
a ~ dlnorm( 0 , 1 ) ,
b ~ dunif( 0 , 50 )
sigma
) , data=d2
)
.3n <- quap(
m4alist(
~ dnorm( mu , sigma ) ,
height <- a + b * ( weight ) ,
mu ~ dnorm( 178 , 20 ) ,
a ~ dlnorm( 0 , 1 ) ,
b ~ dunif( 0 , 50 )
sigma
) , data=d2
)
precis(m4.3)
## mean sd 5.5% 94.5%
## a 154.6006938 0.27032556 154.1686613 155.0327262
## b 0.9032263 0.04192643 0.8362198 0.9702328
## sigma 5.0722178 0.19118654 4.7666648 5.3777708
precis(m4.3n)
## mean sd 5.5% 94.5%
## a 114.5395932 1.89776812 111.5065932 117.5725932
## b 0.8906174 0.04175844 0.8238794 0.9573555
## sigma 5.0727478 0.19125192 4.7670903 5.3784053
The \(\alpha\) parameter has changed since weight is not centered anymore. Earlier, it was the average height when \(x - \bar x\) was equal to 0, now it is the average height when the actual weight is zero.
round(vcov(m4.3), 3)
## a b sigma
## a 0.073 0.000 0.000
## b 0.000 0.002 0.000
## sigma 0.000 0.000 0.037
round(vcov(m4.3n), 3)
## a b sigma
## a 3.602 -0.078 0.009
## b -0.078 0.002 0.000
## sigma 0.009 0.000 0.037
round(cov2cor(vcov(m4.3)) , 3)
## a b sigma
## a 1.000 0.000 0.001
## b 0.000 1.000 -0.003
## sigma 0.001 -0.003 1.000
round(cov2cor(vcov(m4.3n)), 3)
## a b sigma
## a 1.000 -0.990 0.026
## b -0.990 1.000 -0.026
## sigma 0.026 -0.026 1.000
The covariance is much higher for the new model. If we look at the correlation, we see that the \(\alpha\) and \(\beta\) parameters are highly correlated together. The higher covariance can be explained since the weight parameter is not centered anymore and hence has more variation.
pairs(m4.3)
pairs(m4.3n)
Comparing posterior predictions of the two models.
<- seq( from=25 , to=70 , by=1 )
weight.seq <- data.frame(weight = weight.seq)
dw
<- function(model){
plot_wt_posterior <- link(model, dw)
mu <- apply(mu, 2, mean)
mu.mean <- apply(mu, 2, PI)
mu.PI
<- sim(model, dw)
sim.h <- apply(sim.h, 2, PI)
height.PI
plot( height ~ weight , d2 , col=col.alpha(rangi2,0.5), pch = 20)
lines( weight.seq , mu.mean )
shade( mu.PI , weight.seq, col = col.alpha('green') )
shade( height.PI , weight.seq , col = col.alpha('red'))
}
par(mfrow = c(1, 2))
plot_wt_posterior(m4.3)
plot_wt_posterior(m4.3n)
par(mfrow = c(1,1))
The posterior predictions look to be the same. The figure on the left is the posterior prediction from the original model, while the one on the right is from the new model.
4M8 In the chapter, we used 15 knots with the cherry blossom spline. Increase the number of knots and observe what happens to the resulting spline. Then adjust also the width of the prior on the weights – change the standard deviation of the prior and watch what happens. What do you think the combination of knot number and the prior on the weights controls?
data("cherry_blossoms")
<- cherry_blossoms
d
<- d[ complete.cases(d$doy) , ] # complete cases on doy
d2
library(splines)
<- function(knots = 15){
cherry_splines_model <- knots
num_knots <- quantile( d2$year , probs=seq(0,1,length.out=num_knots) )
knot_list
<- bs(d2$year,
B knots=knot_list[-c(1,num_knots)] ,
degree=3 , intercept=TRUE )
# create model
.7 <- quap(
m4alist( D ~ dnorm( mu , sigma ) ,
<- a + B %*% w ,
mu ~ dnorm(100,10),
a ~ dnorm(0, 10),
w ~ dexp(1)
sigma
),data = list( D = d2$doy , B = B ) ,
start = list( w = rep( 0 , ncol(B) ) ) )
return(list(model = m4.7, knot_list = knot_list, B = B))
}
<- function(model, knot_list, B = B){
plot_basis_splines # plotting posterior distribution
<- extract.samples( model )
post <- apply( post$w , 2 , mean )
w
# Plot the basis functions * weights calculated using quap
plot( NULL , xlim=range(d2$year) , ylim=c(-6,6) ,
xlab="year" , ylab="basis * weight" )
for ( i in 1:ncol(B) )
lines( d2$year , w[i]*B[,i] , col = col.alpha(i, 0.7), lwd = 4)
points(knot_list, y = rep(6, length(knot_list)), pch = 3)
}
<- function(model, B = B, text = ncol(B)){
plot_posterior_predictions # Posterior interval for mu (97%)
<- link( model )
mu <- apply(mu, 2, mean)
mu_mean <- apply(mu,2,PI,0.97)
mu_PI
<- sim(model, list( D = d2$doy , B = B ))
doy <- apply(doy, 2, PI, 0.97)
doy_h
plot( d2$year , d2$doy , col=col.alpha(rangi2,0.3) , pch=16 )
lines(d2$year, mu_mean, col = 'blue', lwd = 3)
shade( mu_PI , d2$year , col=col.alpha("black",0.2) )
shade(doy_h, d2$year, col = col.alpha('purple', 0.1))
mtext(text)
}
<- cherry_splines_model(knots = 10)
model1 <- cherry_splines_model(knots = 20)
model2 <- cherry_splines_model(knots = 30) # gives warning that model did not converge
model3
par(mfrow = c(3, 1))
plot_basis_splines(model1$model, model1$knot_list, B = model1$B)
plot_basis_splines(model2$model, model2$knot_list, B = model2$B)
plot_basis_splines(model3$model, model3$knot_list, B = model3$B)
plot_posterior_predictions(model1$model, model1$B)
plot_posterior_predictions(model2$model, model2$B)
plot_posterior_predictions(model3$model, model3$B)
The mean posterior prediction line becomes much more squiggly, especially in the middle years. With 10 knots the line is much more linear in the starting years and overall much more smooth. Increasing knots decreases the smoothness since now it can adjust to more local patterns.
Next, let us see the effect of changing the width of the weights prior while keeping the knots constant at 15.
<- 15
num_knots <- quantile( d2$year , probs=seq(0,1,length.out=num_knots) )
knot_list
<- bs(d2$year,
B knots=knot_list[-c(1,num_knots)] ,
degree=3 , intercept=TRUE )
= list(D = d2$doy, B = B)
dt <- list(w = rep(0, ncol(B)))
start_list
# change width to different values, such as 5, 20 and 30
<- quap(
m1 alist( D ~ dnorm( mu , sigma ) ,
<- a + B %*% w ,
mu ~ dnorm(100,10),
a ~ dnorm(0, 5),
w ~ dexp(1)
sigma data = dt , start = start_list )
),
<- quap(
m2 alist( D ~ dnorm( mu , sigma ) ,
<- a + B %*% w ,
mu ~ dnorm(100,10),
a ~ dnorm(0, 20),
w ~ dexp(1)
sigma data = dt , start = start_list )
),
<- quap(
m3 alist( D ~ dnorm( mu , sigma ) ,
<- a + B %*% w ,
mu ~ dnorm(100,10),
a ~ dnorm(0, 100),
w ~ dexp(1)
sigma data = dt, start = start_list ) ),
## Caution, model may not have converged.
## Code 1: Maximum iterations reached.
plot_posterior_predictions(m1, B, "Knots = 15, Width = 5")
plot_posterior_predictions(m2, B, "Knots = 15, Width = 20")
plot_posterior_predictions(m3, B, "Knots = 15, Width = 30")
par(mfrow = c(1,1))
Increasing the width seems to be making the line more wiggly, especially around the ends, but the effect is not as pronounced with knots = 15.
Hard Questions
4H1 The weights listed below were recorded in the !Kung census, but heights were not recorded for these individuals. Provide predicted heights and 89% intervals for each of these individuals. That is, fill in the table below, using model-based predictions.
Individual | weight | expected height | 89% interval |
---|---|---|---|
1 | 46.95 | ||
2 | 43.72 | ||
3 | 64.78 | ||
4 | 32.59 | ||
5 | 54.63 |
Assuming these are all above 18 years of age, and using the model m4.3 created within the chapter.
data("Howell1")
<- Howell1
dh <- dh[dh$age > 18, ]
dh2
<- mean(dh2$weight)
xbar
.3 <- quap(
m4alist(
~ dnorm( mu , sigma ) ,
height <- a + b*( weight - xbar ) ,
mu ~ dnorm( 178 , 20 ) ,
a ~ dlnorm( 0 , 1 ) ,
b ~ dunif( 0 , 50 )
sigma
) , data=dh2
)
<- c(46.95, 43.72, 64.78, 32.59, 54.63)
weights <- link(m4.3, list(weight = weights))
mu_pred
<- apply(mu_pred, 2, mean)
mu_pred_mean <- apply(mu_pred, 2, PI, 0.89)
mu_pred_PI
data.frame(
weight = weights,
expected_height = mu_pred_mean,
`lower bound 89%` = mu_pred_PI[1,],
`upper bound 89%` = mu_pred_PI[2,]
)
## weight expected_height lower.bound.89. upper.bound.89.
## 1 46.95 156.3791 155.9155 156.8164
## 2 43.72 153.4550 152.9917 153.9021
## 3 64.78 172.5204 171.1058 173.8649
## 4 32.59 143.3791 142.4240 144.3008
## 5 54.63 163.3317 162.5415 164.0802
4H2 Select out all the rows in the Howell1 data with ages below 18 years of age. If you do it right, you should end up with a new data frame with 192 rows in it.
<- dh[dh$age < 18, ]
dh3 nrow(dh3)
## [1] 192
(a) Fit a linear regression to these data, using quap. Present and interpret the estimates. For every 10 units of increase in weight, how much taller does the model predict a child gets?
Assuming the following model:
$$ \[\begin{align} h &\sim Normal(\mu, \sigma) \\ \mu &= \alpha + \beta (weight - \overline {weight}) \\ \alpha &\sim Normal(135, 40) \\ \beta &\sim \operatorname{Log-Normal}(0, 1) \\ \sigma &\sim Uniform(0 , 50) \end{align}\] $$ Since there are children ranging from age 0 to 18, assuming prior average height of all kids as 135 cm (using the source given above in a previous question). Converting the same to a model using quap below.
<- quap(
mod alist(
~ dnorm(mu, sigma),
height <- a + b * weight,
mu ~ dnorm(135, 40),
a ~ dlnorm(0, 1),
b ~ dunif(0, 50)
sigma
), data = dh3
)
precis(mod)
## mean sd 5.5% 94.5%
## a 58.387317 1.39685732 56.154869 60.619764
## b 2.712542 0.06828725 2.603405 2.821678
## sigma 8.437428 0.43060295 7.749242 9.125615
Weight has not been centered for the regression here. The interpretation is as follows:
- \(\alpha\) - with a mean of 58.39, this is the mean height of a kid with theoretical weight of zero (which is not likely).
- \(\beta\) - this is the increase in height per unit of increase in weight. Thus for 10 units increase in weight, we would expect the child would get 20.71 cm taller.
- \(\sigma\) - the standard deviation of heights is 8.44 cm.
(b) Plot the raw data, with height on the vertical axis and weight on the horizontal axis. Superimpose the MAP regression line and 89% interval for the mean. Also superimpose the 89% interval for predicted heights.
range(dh3$weight)
## [1] 4.252425 44.735511
<- seq(4, 45, by = 1)
weight.seq
<- link(mod, list(weight = weight.seq))
sample_mu <- apply(sample_mu, 2, mean)
mu.mean <- apply(sample_mu, 2, PI, 0.89)
mu.PI
<- sim(mod, list(weight = weight.seq))
sim_heights <- apply(sim_heights, 2, PI, 0.89)
simh
plot(height ~ weight, data = dh3, col = col.alpha('black', 0.8), pch = 20)
lines(weight.seq, mu.mean, lwd = 2, col = 'blue')
shade(mu.PI, weight.seq, col = col.alpha('black', 0.4))
shade(simh, weight.seq, col = col.alpha('blue', 0.1))
c) What aspects of the model fit concern you? Describe the kinds of assumptions you would change, if any, to improve the model. You don’t have to write any new code. Just explain what the model appears to be doing a bad job of, and what you hypothesize would be a better model.
The model assumes a linear relationship between weight and height, which from the graph above is clearly not the case. The height increases at a much higher rate for younger children compared to older kids, which can be seen in the graph as the height begins to plateau towards the end. A better fit would involve some non-linear relationship, using polynomial terms or splines with 3-4 knots.
4H3 Suppose a colleague of yours, who works on allometry, glances at the practice problems just above. Your colleague exclaims, “That’s silly. Everyone knows that it’s only the logarithm of body weight that scales with height!” Let’s take your colleague’s advice and see what happens.
(a)Model the relationship between height (cm) and the natural logarithm of weight (log-kg). Use the entire Howell1 data frame, all 544 rows, adults and non-adults. Can you interpret the resulting estimates?
<- quap(
mod2 alist(
~ dnorm(mu, sigma),
height <- a + b * log(weight),
mu ~ dnorm(135, 40),
a ~ dlnorm(0, 1),
b ~ dunif(0, 50)
sigma
), data = dh
)
precis(mod2)
## mean sd 5.5% 94.5%
## a -23.594556 1.3347738 -25.727782 -21.461330
## b 47.021187 0.3824533 46.409953 47.632421
## sigma 5.134845 0.1556812 4.886036 5.383654
Interpretation:
- \(\alpha\) - average height when log weight is zero (which would imply weight = 1kg) is -23.59.
- \(\beta\) - height increases by 47.02 cm when log weight increases by one, or when weight increases 2.72 times (we are using natural log here)
- \(\sigma\) - sd of heights is 5.13 cm.
####(b) Begin with this plot: plot( height ~ weight , data=Howell1 ). Then use samples from the quadratic approximate posterior of the model in (a) to superimpose on the plot: (1) the predicted mean height as a function of weight, (2) the 97% interval for the mean, and (3) the 97% interval for predicted heights.
range(dh$weight)
## [1] 4.252425 62.992589
<- seq(4, 63, by = 1)
weight.seq
<- link(mod2, list(weight = weight.seq))
sample_mu <- apply(sample_mu, 2, mean)
mu.mean <- apply(sample_mu, 2, PI, 0.89)
mu.PI
<- sim(mod2, list(weight = weight.seq))
sim_heights <- apply(sim_heights, 2, PI, 0.89)
simh
plot(height ~ weight, data = dh3, col = col.alpha('black', 0.8), pch = 20)
lines(weight.seq, mu.mean, lwd = 2, col = 'blue')
shade(mu.PI, weight.seq, col = col.alpha('black', 0.4))
shade(simh, weight.seq, col = col.alpha('blue', 0.1))
A much better fit!
4H4 Plot the prior predictive distribution for the parabolic polynomial regression model in the chapter. You can modify the code that plots the linear regression prior predictive distribution. Can you modify the prior distributions of α, β1, and β2 so that the prior predictions stay within the biologically reasonable outcome space? That is to say: Do not try to fit the data by hand. But do try to keep the curves consistent with what you know about height and weight, before seeing these exact data.
Let us first write down the model equation:
$$ \[\begin{align} height &\sim Normal(\mu, \sigma) \\ \mu &= \alpha + \beta_1 weight_s + \beta_2 weight_s^2 \\ \alpha &\sim Normal(178, 20) \\ \beta_1 &\sim \operatorname{Log-Normal(0, 1)} \\ \beta_2 &\sim Normal(0, 1) \\ \sigma &\sim Uniform(0, 50) \end{align}\]
$$
$weight_s <- ( dh$weight - mean(dh$weight) ) / sd(dh$weight)
dh$weight_s2 <- dh$weight_s^2
dh
<- function(a, b1, b2, N = 100){
plot_prior_poly plot( NULL , xlim=range(dh$weight_s) , ylim=c(-100,400) , xlab="weight" , ylab="height")
abline( h=0 , lty=2 )
abline( h=272 , lty=1 , lwd=0.5 )
mtext( "b1 ~ dlnorm(0,1); b2 ~ dnorm(0, 1)" )
for ( i in 1:N )
curve( a[i] + b1[i]*x + b2[i]*(x^2) ,
from = min(dh$weight_s) , to = max(dh$weight_s) ,
add = TRUE , col=col.alpha("black", 0.2) )
}
What do we know about human heights?
- kids grow fast, as we grow old, we don’t grow taller anymore
- biological limits
- heights lie in the range of 0 to 272, avg (taking into account kids would be in the range of 120?)
# original priors
<- 100
N plot_prior_poly(
a = rnorm(N, 178, 20),
b1 = rlnorm(N, 0, 1),
b2 = rnorm(N, 0, 1),
N )
These are mostly straight lines, no curvature to account for the change in growth rate.
plot_prior_poly(
a = rnorm(N, 150, 20),
b1 = rlnorm(N, 3, 0.3),
b2 = rnorm(N, -10, 2),
N )
This seems to be a better prior that takes into account the curvature present in the data.
4H5 Return to data(cherry_blossoms) and model the association between blossom date (doy) and March temperature (temp). Note that there are many missing values in both variables. You may consider a linear model, a polynomial, or a spline on temperature. How well does temperature trend predict the blossom trend?
data("cherry_blossoms")
<- cherry_blossoms[complete.cases(cherry_blossoms),]
ch plot(doy ~ temp, data = ch, pch = 20)
We can have the following models - Linear, polynomial with degree 3 and splines model:
$$ \[\begin{align} D_i &\sim Normal(\mu_i , \sigma) \\ \mu &= \alpha + \beta \ t_i \\ \alpha &= Normal(100, 10) \\ \beta &= Normal(0, 10) \\ \sigma &= Uniform(0, 50) \end{align}\] $$
<- quap(
mch_lin alist(
~ dnorm(mu, sigma),
doy <- a + b * temp,
mu ~ dnorm(100, 10),
a ~ dnorm(0, 10),
b ~ dunif(0, 50)
sigma
), data = ch
)precis(mch_lin)
## mean sd 5.5% 94.5%
## a 122.370906 1.8601930 119.397958 125.343854
## b -2.862113 0.3031032 -3.346530 -2.377695
## sigma 5.910713 0.1490177 5.672554 6.148872
Let’s plot the posterior prediction for this:
<- seq(min(ch$temp), max(ch$temp), length.out = 100)
t_seq
<- link(mch_lin, data = list(temp = t_seq))
mu_lin <- apply(mu_lin, 2, mean)
mu_lin_mean <- apply(mu_lin, 2, PI, 0.97)
mu_lin_PI
<- apply(sim(mch_lin, list(temp = t_seq)), 2, PI, 0.97)
dsim
plot(doy ~ temp, data = ch, pch = 20, col = col.alpha('black', 0.8))
lines(t_seq, mu_lin_mean, lwd = 2, col = 'red')
shade(mu_lin_PI, t_seq)
shade(dsim, t_seq, col = col.alpha('green', 0.2) )
Seems to be a good enough fit.
Let’s try a polynomial next.
<- ch
ch2 $temp_s <- (ch2$temp - mean(ch2$temp)) / sd(ch2$temp)
ch2$temp_s2 <- ch2$temp_s^2
ch2$temp_s3 <- ch2$temp_s^3
ch2
<- quap(
mch_poly alist(
~ dnorm(mu, sigma),
doy <- a + b1 * temp_s + b2 * temp_s2 + b3 * temp_s3,
mu ~ dnorm(100, 10),
a ~ dnorm(0, 10),
b1 ~ dnorm(0, 10),
b2 ~ dnorm(0, 10),
b3 ~ dunif(0, 50)
sigma
), data = ch2
)precis(mch_poly)
## mean sd 5.5% 94.5%
## a 104.95356995 0.2709141 104.5205969 105.3865430
## b1 -2.20731179 0.3665419 -2.7931165 -1.6215071
## b2 -0.06919811 0.2102626 -0.4052383 0.2668421
## b3 0.06005884 0.1111464 -0.1175745 0.2376922
## sigma 5.90883848 0.1489359 5.6708102 6.1468668
Let’s see the posterior distribution for this:
<- seq(-2, 3, length.out = 30)
t_seq
<- list( temp_s = t_seq, temp_s2 = t_seq^2, temp_s3 = t_seq^3)
pred_dat
<- link(mch_poly, data = pred_dat)
mu_poly <- apply(mu_poly, 2, mean)
mu_poly_mean <- apply(mu_poly, 2, PI, 0.97)
mu_poly_PI
<- apply(sim(mch_poly, pred_dat), 2, PI, 0.97)
dsim
plot(doy ~ temp_s, data = ch2, pch = 20, col = col.alpha('black', 0.8))
lines(t_seq, mu_poly_mean, lwd = 2, col = 'red')
shade(mu_poly_PI, t_seq)
shade(dsim, t_seq, col = col.alpha('green', 0.2) )
This is mostly a linear fit with just some curvautre at the ends. We can also look at the output of precis
to confirm that the mean values for b2 and b3 are almost zero.
4H6 Simulate the prior predictive distribution for the cherry blossom spline in the chapter. Adjust the prior on the weights and observe what happens. What do you think the prior on the weights is doing?
data("cherry_blossoms")
<- cherry_blossoms
d
<- d[ complete.cases(d$doy) , ] # complete cases on doy
d2
# create basis splintes with 15 knots
<- 15
num_knots <- quantile( d2$year , probs=seq(0,1,length.out=num_knots) )
knot_list
<- bs(d2$year,
B knots=knot_list[-c(1,num_knots)] ,
degree=3 , intercept=TRUE )
alist( D ~ dnorm( mu , sigma ) ,
<- a + B %*% w ,
mu ~ dnorm(100,10),
a ~ dnorm(0,10),
w ~ dexp(1)
sigma )
## [[1]]
## D ~ dnorm(mu, sigma)
##
## [[2]]
## mu <- a + B %*% w
##
## [[3]]
## a ~ dnorm(100, 10)
##
## [[4]]
## w ~ dnorm(0, 10)
##
## [[5]]
## sigma ~ dexp(1)
<- function(a, w, N){
plot_prior_spline plot( NULL , xlim = range(d2$year) , ylim=c(0, 200) , xlab="year" , ylab="doy")
abline( h = 0 , lty=2 ) # earliest day in year
abline( h = 200 , lty=1 , lwd=0.5 ) # max day in year
for ( i in 1:N ) {
<- a[i] + B %*% w[i,]
y lines(d2$year, y, col = col.alpha('black', 0.2))
}
}
<- 100
N
# Original Priors
plot_prior_spline(
a = rnorm(N, 100, 10),
w = matrix(rnorm(N*ncol(B), 0, 10), nrow = N, ncol = ncol(B)),
N )
# Other Priors
plot_prior_spline(
a = rnorm(N, 100, 20),
w = matrix(rnorm(N*ncol(B), 0, 10), nrow = N, ncol = ncol(B)),
N )
plot_prior_spline(
a = rnorm(N, 100, 10),
w = matrix(rnorm(N*ncol(B), 0, 5), nrow = N, ncol = ncol(B)),
N )
plot_prior_spline(
a = rnorm(N, 100, 10),
w = matrix(rnorm(N*ncol(B), 0, 20), nrow = N, ncol = ncol(B)),
N )
plot_prior_spline(
a = rnorm(N, 100, 20),
w = matrix(rnorm(N*ncol(B), 0, 20), nrow = N, ncol = ncol(B)),
N )
plot_prior_spline(
a = rnorm(N, 100, 20),
w = matrix(rnorm(N*ncol(B), 3, 20), nrow = N, ncol = ncol(B)),
N )
The prior on the weights seems to be affect how much the model can localise (how squiggly can the lines get).
4H8 (there is no 4H7) The cherry blossom spline in the chapter used an intercept α, but technically it doesn’t require one. The first basis functions could substitute for the intercept. Try refitting the cherry blossom spline without the intercept. What else about the model do you need to change to make this work?
data("cherry_blossoms")
<- cherry_blossoms
d
<- d[ complete.cases(d$doy) , ] # complete cases on doy
d2
# create basis splintes with 15 knots
<- 15
num_knots <- quantile( d2$year , probs=seq(0,1,length.out=num_knots) )
knot_list <- bs(d2$year,
B knots=knot_list[-c(1,num_knots)] ,
degree=3 , intercept=TRUE )
.8 <- quap(
m4alist( D ~ dnorm( mu , sigma ) ,
<- B %*% w ,
mu # a ~ dnorm(100,10),
~ dnorm(0,10),
w ~ dexp(1)
sigma
),data = list( D = d2$doy , B = B ) ,
start = list( w = rep( 0 , ncol(B) ) ) )
# can check the posterior means for all params
precis(m4.8, depth = 2)
## mean sd 5.5% 94.5%
## w[1] 92.838731 3.2201626 87.692290 97.985173
## w[2] 102.760409 3.0854478 97.829267 107.691550
## w[3] 100.428973 2.7543625 96.026969 104.830976
## w[4] 108.337582 1.6439729 105.710196 110.964968
## w[5] 101.558858 1.6760828 98.880154 104.237562
## w[6] 106.968261 1.7385933 104.189653 109.746869
## w[7] 97.554281 1.5259420 95.115531 99.993031
## w[8] 110.637344 1.5320294 108.188866 113.085823
## w[9] 101.677640 1.6790261 98.994232 104.361048
## w[10] 105.758620 1.7276596 102.997486 108.519754
## w[11] 107.369283 1.6970097 104.657134 110.081433
## w[12] 102.668148 1.6532521 100.025932 105.310364
## w[13] 108.149485 1.6931479 105.443508 110.855462
## w[14] 103.769605 1.8668756 100.785978 106.753233
## w[15] 101.303865 2.3410846 97.562360 105.045371
## w[16] 95.977783 2.4387956 92.080117 99.875450
## w[17] 92.126400 2.2979856 88.453776 95.799025
## sigma 5.946086 0.1488284 5.708229 6.183942
Earlier, all the ‘w’ terms were around 0, now they are in the range of 100. The intercept was catching the average first day for blooming of cherry blossoms, now this part is captured by the all the ‘w’.
Let us next look at the posterior distribution.
# First, just drawing the splines times the weight
<- extract.samples( m4.8 )
post <- apply( post$w , 2 , mean )
w
# Plot the basis functions * weights calculated using quap
plot( NULL , xlim=range(d2$year) , ylim=c(0,100) ,
xlab="year" , ylab="basis * weight" )
for ( i in 1:ncol(B) )
lines( d2$year , w[i]*B[,i] , col = col.alpha(i, 0.7), lwd = 4)
points(knot_list, y = rep(100, num_knots), pch = 3)
# Now let us look at the posterior distribution
# Posterior interval for mu (97%)
<- link( m4.8 )
mu <- apply(mu, 2, mean)
mu_mean <- apply(mu,2,PI,0.97)
mu_PI
<- sim(m4.8, list( D = d2$doy , B = B ))
doy <- apply(doy, 2, PI, 0.97)
doy_h
plot( d2$year , d2$doy , col=col.alpha(rangi2,0.3) , pch=16 )
lines(d2$year, mu_mean, col = 'darkred')
shade(doy_h, d2$year, col = col.alpha('red', 0.1))
shade( mu_PI , d2$year , col=col.alpha("black",0.5) )
The model seems to have worked similarly without the intercept.