Tutorial: Adding confidence bands to effect displays
In the previous blog post, I demonstrated how you can draw effect displays to render regression models more intelligible to yourself and to your audience. These effect displays did not contain information about the uncertainty inherent to estimating regression models, however. To that end, this blog post demonstrates how you can add confidence bands to effect displays for multiple regression, logistic regression, and logistic mixed-effects models, and explains how these confidence bands are constructed.
The Data, Model, and Effect display subsections are identical to those in the previous post, so I left out the accompanying text. In the following subsections, I first show how you can easily add confidence bands to such effect displays and then explain how this method works under the hood.
Add confidence bands using
predict()’s built-in method
For regression models fitted using
confidence bands can easily be constructed using the
we already used to calculate the outcome values
for the predictor variables in the previous subsection.
In addition to specifying the model
and a data frame containing the combinations of predictor
variables for which you want to generate predictions,
you need to explicitly set the parameter
If you don’t want to construct 95% confidence bands (the default),
you can specify the desired confidence level using the
Below I set the desired confidence level to 90%, just to demonstrate
the parameter’s use.
When used in this fashion,
predict() generates a matrix with 3 columns.
The first column contains the predictions themselves,
i.e., what you’d get if you didn’t specify
whereas the second and third column contain the
lwr) and upper (
of the confidence intervals about the predictions.
The code below adds these two latter columns
We can now draw ribbons showing
the 90% confidence band around
the lines showing the predictions.
ggplot, you can use either
geom_smooth() to this end, but I find
predict() construct confidence bands?
When you want to construct confidence bands for
more complex models, such as the logistic mixed-effects model
we’ll discuss below, the
predict() function is of little help
and you’ll need to construct them ‘by hand’.
For this, it’s useful to know how
under the hood so that we’ll be able to apply
the same principles to more complex models.
The main idea behind this approach is that you can compute the standard errors of the predicted values on the basis of (a) the combinations of predictor variables for which the predictions were computed and (b) the variance–covariance matrix of the estimated parameters of the model you fitted.
As for (a), we need to construct the
model matrix for the data frame with the
combinations of predictor variables (
The model matrix is just an array
of numbers which, when multiplied with the
estimated model coefficients, yields the model’s
This matrix is very similar to the data frame
with the predictor combinations itself, except
that it contains an additional column containing
1’s (representing the intercept in the model)
and one for which the
values have been multiplied (representing the interaction
between these two predictors in the model).
c.Raven values in
nd1_eng are set to 0,
this column, too, only contains 0’s.
If you multiply this matrix by the estimated model coefficients, you get the predicted outcomes for these predictor combinations:
It so happens
that the variances of the products
of a matrix (such as
and a random vector (such as
can be computed by multiplying the matrix by the variance–covariance
matrix of the random vector and then by the transpose of the matrix.
This is quite a mouthful, but what it means is that if we get
the variance–covariance matrix of the model coefficients,
which summarises the uncertainty about the model coefficients
and their relationships,
we can compute the variances of the predicted outcomes.
Luckily, you can easily extract the variance–covariance matrix
and calculate this three-way product:
In the present case, this yields a 50-by-50 matrix that has the variances of the 50 predicted outcomes on the main diagonal. The square roots of these variances are the standard errors of the predicted outcomes. Let’s extract these:
These standard errors, in turn, can be used to compute confidence intervals about the predicted outcomes by multiplying them by the appropriate t-value. For a symmetric 90% confidence interval, we need the t-value corresponding to the 95th percentile of the t-distribution with the model’s residual degrees of freedom (90% of the distribution is contained between the 5th and the 95th percentile; the t-value corresponding to the 5th percentile has the same absolute value as the one corresponding to the 95th but is negative, so we take the t-value corresponding to the 95th percentile):
Multiply the standard errors by this t-value (1.65) and add or subtract the product from the predicted outcomes to get the confidence intervals about the predictions:
As you can see, these manually computed boundaries
correspond to the boundaries computed using
By stringing together these confidence intervals, you get a confidence band. Or if you want to be more precise, a pointwise confidence band. What this is means is that the coverage probability of the confidence band is (in this case) 90% for each point on the line—which makes sense, because that’s how the confidence band was constructed: by stringing together 90% confidence intervals.
(There also exists another type of confidence band: simultaneous confidence bands. I’m not going to discuss these here.)
The confidence bands for the Raven-by-sex interaction is left as an exercise to the reader :)
Some miscellaneous points about confidence bands
- Confidence bands are narrowest for the mean predictor value.
- If you fix non-focal predictors at a typical value for the purposes of drawing an effect display, fixing them at their sample mean yields narrower confidence bands than fixing them at any other value.
We can similarly construct confidence bands for logistic regression models. See the previous post for details about the data and the model, which I’ll skip here.
Data, model and effect display
Note that the predicted values are expressed
in log-odds, not in probabilities (
type = "link").
Constructing confidence bands using
For generalised linear models, such as logistic regression
predict() doesn’t return confidence intervals
for predicted values, but it can return standard errors.
We can use these to construct the confidence bands ourselves.
These standard errors are expressed in log-odds. To construct confidence intervals, we need to find the appropriate multiplier. For ordinary least-squares regression (cf. above), there is an exact solution to this problem, namely by referring to a t-distribution with the appropriate degrees of freedom. For logistic regression, we have to make do with an approximation based on the normal distribution.
Now multiply the standard errors by this multiplier and add/subtract them from the predicted outcomes:
This yields confidence intervals in log-odds. Probabilities are easier to interpret, though, so we run these through the logistic function to transform the log-odds to probabilities:
Stringing the confidence intervals together, we get the 90% pointwise confidence band:
Mixed-effects logistic regression
Once you get the hang of it, constructing confidence bands for the fixed effects in a mixed-effects model isn’t really that much more difficult than for the other models.
For details about the data, the model specification and the basic effect display, please refer to the previous blog post.
Data, model and effect display
Note that this time, the predictions
are generated in log-odds, not in probabilities
type = "link").
Construct confidence band manually
predict() doesn’t generate confidence intervals
or standard errors for mixed-effects models,
so we’ll have to compute them manually.
The method is the same as for the ordinary least-squares
model above; see http://glmm.wikidot.com/faq#predconf
for a summary.
First, construct the model matrix for the combinations of predictor variables we generated the outcomes for.
Then extract the variance–covariance matrix of the model and use it to compute the standard errors for the predictions (in log-odds):
Now find the multiplier appropriate for 90% confidence intervals and carry out the same calculation as the previous two times:
Run the predicted values and their confidence bands through the logistic function to transform them to probabilities:
Note that this confidence band is based solely on the fixed-effects. As such, it should be taken to reflect the uncertainty about the average participant’s regression curve; the regression curves of individual participants can differ wildly from this average.
This post hardly makes for captivating reading, but I hope it helps some of you who want to visualise the uncertainty in their regression models.