Common-language effect sizes
The goal of this blog post is to share with you a simple
that may help you to better communicate the extent to which two groups differ and overlap
by computing common-language effect sizes.
What is the ‘common-language effect size’?
In 1992, McGraw and Wong introduced the common-language effect size, which they defined as
the probability that a score sampled at random from one distribution will be greater than a score sampled from some other distribution.
For instance, if you have scores on an English reading comprehension task for both French- and German-speaking learners, you can compute the probability that a randomly chosen French-speaking learner will have a higher score than a randomly chosen German-speaking learner. This gives you an idea of how much the groups’ scores overlap, and the number can more easily be communicated to an audience that has no firm notion of what quantiles are or of what standardised effect sizes such as d = 0.3 mean.
Computing common-language effect sizes in R
Below I first generate some data:
40 data points in a group creatively called
A vs. 30 data points in group
A couple of boxplots to show the spread and central tendencies:
And the key summary statistics:
On the basis of the group means and standard deviations, McGraw and Wong’s common-language effect size can be computed as follows:
I.e., there’s a 38% chance that if you put an observation from Group A and one from Group B together at random, the one from Group A will be greater.
Strictly speaking, McGraw and Wong’s method assumes normally distributed, continuous data. While they point out that their measure is quite robust with respect to this assumption, you can use a brute-force method that doesn’t make this assumption to see if that yields different results.
Edit: On Twitter, Guillaume Rousselet suggested a quicker and mor exhaustive brute-force method for computing common-language effect sizes. I’ve updated the code and post to implement his suggestion.
I provide a function,
that pairs each observation from the first group to each observation from the second group
and then checks how often the observation from the first group is larger than the one from the second group.
Ties are also taken into account.
Here’s how the
cles.fnc() function works:
The results for both methods aren’t identical (38% vs. 40%), but they’re in the same ballpark. This is more often the case than not.
You can turn off the output by setting the parameter
You can also extract information from the
cles object if you want to pass it on to other functions:
An example with non-overlapping distributions
The code below generates a dataset with two non-overlapping groups.
McGraw & Wong’s (1992) method suggests that there’s a 6% chance that a random observation in A will be higher than one in B. This may well be true at the population level, but it’s clearly not true at the sample level. The brute-force method pegs this probability at 0%, which may be wrong at the population level, but it’s clearly correct at the sample level.
For use with more complex datasets
Let’s say you have data from a longitudinal study in which you collected data for Groups A and B at Times 1, 2 and 3, and you want to compare the groups at each time:
by() function, you can run
cles.fnc() separately for each
For more complex datasets, you can include more variables in the