Splitting up continuous variables is generally a bad idea. In terms of statistical efficiency, the popular practice of dichotomising continuous variables at their median is comparable to throwing out a third of the dataset. Moreover, statistical models based on split-up continuous variables are prone to misinterpretation: threshold effects are easily read into the results when, in fact, none exist. Splitting up, or ‘binning’, continuous variables, then, is something to avoid. But what if you’re interested in how the effect of one continuous predictor varies according to the value of another continuous predictor? In other words, what if you’re interested in the interaction between two continuous predictors? Binning one of the predictors seems appealing since it makes the model easier to interpret. However, as I’ll show in this blog post, it’s fairly straightforward to fit and interpret interactions between continuous predictors.

## Data

To keep things easy, I will use a simulated dataset with two continuous predictors (`pred1` and `pred2`) and one continuous `outcome`. The dataset is available here.

## Exploratory plots

First, let’s draw some quick graphs to get a sense of how the predictors and the outcome are related. Left: The predictors are negatively correlated with each other. Middle: `pred1` is negatively correlated with the outcome. Right: `pred2` seems to be nonlinearly correlated with the outcome..

When we suspect there’s an interaction between two predictors, it’s useful to draw conditioning plots as well. For these plots, the dataset is split up into a number of overlapping equal-sized regions defined by a conditioning variable, and the relationship between the predictor of interest and the outcome within each region is plotted.  Top: The effect of `pred1` on the outcome in four regions defined by pred2. The negative relationship grows stronger as `pred2` increases (from left to right). Bottom: The effect of `pred2` on the outcome in four regions defined by pred1. The relationship may be positive for small values of `pred1`, but becomes negative for larger values (from left to right).

## Fitting the model

I find it easiest to fit the interaction between two continuous variables as a wiggly regression surface. A wiggly regression surface is the generalisation of a wiggly curve, such as the one in Figure 3 in this earlier blog post, into two dimensions. The advantage of fitting a wiggly surface rather than a plane is that we don’t have to assume that the interaction is linear. Rather, the shape of the surface will be estimated from the data. To fit such wiggly surfaces, we need the `gam()` function in the `mgcv` package. The wiggly regression surface is fitted by feeding the two predictors to a `te()` function within the `gam()` call.

A numerical summary of the model can be obtained using `summary()`, but as you can see, this doesn’t offer much in the way of interpretable results.

To understand what the wiggly regression surface looks like, you need to plot it. This can be done using the `vis.gam()` function. The two plots below show the same regression surface: once as a three-dimensional plot and once as a contour plot. The latter can be read like a topographic map of hilly terrain: the contour lines connect points on the surface with the same height, and the closer the contour lines are to each other, the steeper the surface is.  What can be gleaned from these plots? First of all, the outcome variable is largest for low `pred1` and `pred2` values. Second, `pred1` and `pred2` both have strong negative effects when the respective other variable is large, but their effect is quite weak when the other variable is low.

The contour lines are useful to quantify this. Going along the y-axis for a low `pred1` value (say, 2.5), we only cross one contour line, and we seem to climb slightly. Going along the y-axis for a large `pred1` value (say, 12.5), we cross 8 contour lines and we descend considerably, from around 100 on the outcome variable to below 30. Similarly, going along the x-axis for a low `pred2` value (say, 0.2), we descend slightly (perhaps 20 units). Going alongside the x-axis for a large `pred2` value (say, 0.8), we cross 10 contour lines. The effect of `pred1`, then, depends on the value of `pred2` and vice versa. Or in other words, `pred1` and `pred2` interact with each other.

## Improved visualisation

I find it useful to plot a contour plot with the wiggly regression surface with the interaction alongside a contour plot with a regression surface without an interaction. I find this helps me to better understand what the interaction consists of. This is particularly helpful when the one or two of the predictors are nonlinearly related to the outcome. To draw the second contour plot, first fit a model without an interaction. By feeding each predictor to a separate `te()` function, potentially nonlinear regression curves are estimated for both predictors.

Then plot both contour plots alongside each other. Here I also set the `too.far` parameter. This way, region of the graph for which no combinations of predictor variables are available are not plotted. ## Decomposing a regression surface into main effects and an interaction

It may be useful to ‘decompose’ the regression surface into the main effects associated with each predictor and the interaction itself. This can be done by fitting the main effects and the interaction in separate `ti()` (not `te()`) functions. This fits non-linear main effects, which is what we want: this way, the interaction term doesn’t absorb any nonlinearities not modelled by the linear main effects.

The wiggly regression surface of `m1` has here been decomposed into the main effects of `pred1` and `pred2` and an interaction between them. Both main effects and the interaction are significant.

The `edf` values in the table above, incidentally, express how nonlinear the effects are estimated to be. Values close to 1 indicate that the effect is linear, whereas a value of 2 suggests that the effect can be modelled using a linear and a quadratic term etc. Here both main effects could sensibly be modelled linearly, but the interaction couldn’t. (A linear interaction means that you multiply both predictors together and use this product as an additional predictor.)

There’s much more to fitting nonlinearities and interactions using `gam()` than discussed here, but hopefully this was useful enough to dissuade you from carving up continuous variables when you’re interested in their interaction.

26 June 2017