Interactions between continuous variables
Splitting up continuous variables is generally a bad idea. In terms of statistical efficiency, the popular practice of dichotomising continuous variables at their median is comparable to throwing out a third of the dataset. Moreover, statistical models based on split-up continuous variables are prone to misinterpretation: threshold effects are easily read into the results when, in fact, none exist. Splitting up, or ‘binning’, continuous variables, then, is something to avoid. But what if you’re interested in how the effect of one continuous predictor varies according to the value of another continuous predictor? In other words, what if you’re interested in the interaction between two continuous predictors? Binning one of the predictors seems appealing since it makes the model easier to interpret. However, as I’ll show in this blog post, it’s fairly straightforward to fit and interpret interactions between continuous predictors.
To keep things easy, I will use a simulated dataset with two continuous predictors (
pred2) and one continuous
outcome. The dataset is available here.
First, let’s draw some quick graphs to get a sense of how the predictors and the outcome are related.
Left: The predictors are negatively correlated with each other. Middle:
pred1is negatively correlated with the outcome. Right:
pred2seems to be nonlinearly correlated with the outcome..
When we suspect there’s an interaction between two predictors, it’s useful to draw conditioning plots as well. For these plots, the dataset is split up into a number of overlapping equal-sized regions defined by a conditioning variable, and the relationship between the predictor of interest and the outcome within each region is plotted.
Top: The effect of
pred1on the outcome in four regions defined by pred2. The negative relationship grows stronger as
pred2increases (from left to right). Bottom: The effect of
pred2on the outcome in four regions defined by pred1. The relationship may be positive for small values of
pred1, but becomes negative for larger values (from left to right).
Fitting the model
I find it easiest to fit the interaction between two continuous variables as a wiggly regression surface. A wiggly regression surface is the generalisation of a wiggly curve, such as the one in Figure 3 in this earlier blog post, into two dimensions. The advantage of fitting a wiggly surface rather than a plane is that we don’t have to assume that the interaction is linear. Rather, the shape of the surface will be estimated from the data.
To fit such wiggly surfaces, we need the
gam() function in the
mgcv package. The wiggly regression surface is fitted by feeding the two predictors to a
te() function within the
A numerical summary of the model can be obtained using
summary(), but as you can see, this doesn’t offer much in the way of interpretable results.
To understand what the wiggly regression surface looks like, you need to plot it. This can be done using the
vis.gam() function. The two plots below show the same regression surface: once as a three-dimensional plot and once as a contour plot. The latter can be read like a topographic map of hilly terrain: the contour lines connect points on the surface with the same height, and the closer the contour lines are to each other, the steeper the surface is.
What can be gleaned from these plots? First of all, the outcome variable is largest for low
pred2 both have strong negative effects when the respective other variable is large, but their effect is quite weak when the other variable is low.
The contour lines are useful to quantify this. Going along the y-axis for a low
pred1 value (say, 2.5), we only cross one contour line, and we seem to climb slightly.
Going along the y-axis for a large
pred1 value (say, 12.5), we cross 8 contour lines and we descend considerably, from around 100 on the outcome variable to below 30.
Similarly, going along the x-axis for a low
pred2 value (say, 0.2), we descend slightly (perhaps 20 units).
Going alongside the x-axis for a large
pred2 value (say, 0.8), we cross 10 contour lines.
The effect of
pred1, then, depends on the value of
pred2 and vice versa.
Or in other words,
pred2 interact with each other.
I find it useful to plot a contour plot with the wiggly regression surface with the interaction alongside a contour plot with a regression surface without an interaction. I find this helps me to better understand what the interaction consists of. This is particularly helpful when the one or two of the predictors are nonlinearly related to the outcome. To draw the second contour plot, first fit a model without an interaction. By feeding each predictor to a separate
te() function, potentially nonlinear regression curves are estimated for both predictors.
Then plot both contour plots alongside each other. Here I also set the
too.far parameter. This way, region of the graph for which no combinations of predictor variables are available are not plotted.
Decomposing a regression surface into main effects and an interaction
It may be useful to ‘decompose’ the regression surface into the main effects associated with each predictor and the interaction itself.
This can be done by fitting the main effects and the interaction
te()) functions. This fits non-linear main effects, which is what we want: this way, the interaction term doesn’t absorb any nonlinearities not modelled by the linear main effects.
The wiggly regression surface of
m1 has here been decomposed
into the main effects of
pred2 and an interaction
between them. Both main effects and the interaction are significant.
edf values in the table above, incidentally, express
how nonlinear the effects are estimated to be. Values close
to 1 indicate that the effect is linear, whereas a value of 2
suggests that the effect can be modelled using a linear
and a quadratic term etc. Here both main effects could
sensibly be modelled linearly, but the interaction couldn’t.
(A linear interaction means that you multiply both predictors together and use this product as an additional predictor.)
There’s much more to fitting nonlinearities and interactions using
gam() than discussed here, but hopefully this was useful enough to dissuade you from carving up continuous variables when you’re interested in their interaction.