Drawing scatterplot matrices
This is just a quick blog post to share a function with which you can draw scatterplot matrices.
Scatterplot matrices are useful for displaying the intercorrelation between
several continuous variables. Based on the help page for the
R, I put together a function for quickly drawing scatterplot matrices
that also show the Pearson correlation coefficients for the bivariate relationships
as well as the number of observations that go into them:
Here’s how you’d use it:
xcontains the vectors (= columns in a dataset) containing the continuous variables you want to plot in the order you want to plot them. To the extent possible, try to arrange the variables in such an order that the earlier variables are more likely to be influenced by the later variables than vice versa. I’m hardly an expert on meteorology, so the order in which I put the variables may not be optimal – but
Monthis evidently more likely to influence
Temperature than vice versa, so put
Temp. Similarly, if you collected your participants’ age, L1 vocabulary skills, and their results on a cognate translation test in L2, put age last, L1 vocabulary skills second and the translation test results first.
labelscontains the readable names of the variables, in that same order. If you leave out this argument, the column names will serve as labels.
Main diagonal: Histograms for each variable as well as the number of available data points for that variable.
Upper triangle: Scatterplots for the bivariate relationships between the variables, with a nonlinear scatterplot smoother. In the example above, the scatterplot in the first row, third column, shows the relationship between the Temperature (the third variable, on the x-axis) and Ozone (the first variable, on the y-axis). The scatterplot in the second row, fifth column, shows the relationship between Month (the fifth variable, on the x-axis) and Wind (the second variable, on the y-axis).
Lower triangle: Pearson correlation coefficients for the bivariate relationships between the variables and the number of observations on which it is based. In the example above, the correlation coefficient in the third row, first column, concerns the relationship between the Temperature and Ozone (the first variable). The correlation coefficient in the second row, fifth column, shows the relationship between Month (the fifth variable) and Wind (the second variable).
Functions you can use instead of this one are
pairscor.fnc() in the
ggpairs() in the