Tutorial: Drawing a dot plot
In the fourth tutorial on drawing useful plots with
ggplot2, we’re taking a closer look at dot plots – a useful and more flexible alternative to bar and pie charts.
What’s a dot plot?
The three panels below show a same data in a pie chart, a bar chart and a dot plot. For data like these, the bar chart and the dot plot allow us to compare the sales of different kinds of pie about equally well. The dot plot has a higher data-ink ratio, but I don’t think that’s too decisive a factor.
Where dot plots excel is when you want to display data with more than two dimensions. In the plots above, the data had two dimensions: the kind of pie and the proportion of sales. In the dot plot below, you find an additional dimension: year (2015 vs. 2016). You couldn’t display this additional dimension in a single pie chart, and you’d need side-by-side bars to do it in a bar chart, which usually looks cluttered.
Tutorial: Drawing a dotchart in ggplot2
What you’ll need
- The free program R, the graphical user interface RStudio, and the add-on package
- A dataset. The data we’ll use were collected in a project on language transfer (download).
About 200 native speakers of Dutch from The Netherlands and Belgium (
Country) were asked to pick a German gender-marked definite article (der, die or das) for 44 German nouns (
Stimulus). These nouns all had cognates in Dutch (
DutchCognate), which had either common or neuter gender (
DutchGender). The expectation is that Dutch speakers from either country will tend to assign the neuter German article (das) to German words with neuter Dutch cognates compared to words with common-gender Dutch cognates. The dataset also lists the German words’ actual gender (
I don’t like
ggplot2’s default grey background, so let’s change the default theme to black and white:
A first attempt
Let’s plot the proportion of neuter article (das) choices by both the Belgian and the Dutch participants for each German noun. Dot plots show the numeric information along the x-axis and the categorical information (labels) along the y-axis, so we specify those mappings in second and third lines. In the fourth line, we specify that the data points need to be plotted as points or dots, and lastly we customise the axis labels.
The main comparison is between German words that have neuter Dutch cognates and those that have common-gender Dutch cognates.
To highlight this comparison, we can plot the data for both word categories in different panels.
facet_grid layer, we can specify that the words with common and with neuter Dutch gender are to be plotted on different rows of a grid (
x ~ .).
. ~ x would’ve plotted them in different columns, but having them in different rows but the same column makes for an easier comparison.)
space arguments to
"free_y" ensures that items for which data is available in only one panel aren’t shown in the other panels as well (
scales) and that the size of the panels is proportionate to the number of items in them (
If you set these arguments to
"fixed", you’ll see what I mean.
This plot strongly suggests that the gender of the German words’ Dutch cognates has a major effect on how often Dutch speakers pick das as their article: with the exception of one word, Boot, the ranges in the two panels don’t even overlap.
However, the German words are ordered alphabetically. While we’re at it, we might as well sort them more meaningfully – for instance, according to the average proportion of das responses per word. Additionally, I don’t find the default filled circle and triangle symbols that represent the Belgian and Dutch responses very distinctive, so we’ll change these, too.
Sorting the items by their average value
In my previous post, I introduced a custom function
for sorting the levels of a factor according to
the average value of another variable per level.
Here we use this function to sort the levels of
according to their average value of
We also use another custom function to put the words with neuter cognates in the top instead of in the bottom panel.
To change the default symbols, we use
For black and white plots, I prefer empty circles and crosses,
which are known internally as symbols 1 and 3, respectively:
The difference between responses to words with neuter cognates and to those with common-gender cognates is now particularly clear. Nevertheless, there is a substantial degree of variation between the items, particularly in to words with neuter cognates. Aficionados of the German language may’ve noticed, however, that the top words within each panel all have neuter gender in German, i.e., the article das is the correct choice for these words. The bottom words, by contrast, all have masculine or feminine gender in German. As this factor – whether the word actually is neuter in German or not – can straightforwardly account for some variation within each panel due to people having learnt the correct gender, it makes sense to include this information in the plot, too.
Adding another facetting variable
First we create a new variable that specifies whether the German word actually has neuter gender or not.
Then we add this new variable to the
Adding this additional facetting variable may be useful for making it immediately clear to the casual reader that the study featured a mixture of both congruent (neuter–neuter and non-neuter–common) and incongruent (neuter–common and non-neuter–neuter) cognates
Additionally, it shows that while the Dutch participants consistently and correctly choose more neuter responses than the Belgians for neuter–neuter cognates, they don’t pick the correct neuter article more often for neuter–common cognates, nor do they choose the neuter article less often than the Belgians for non-neuter words. To me, this suggests that the actual knowledge of German gender didn’t greatly differ between the Belgian and the Dutch participants.
Lastly, the word standing out in all of this is Boot, for which most participants correctly picked neuter das even though its highly transparent cognate in Dutch, boot, is common-gender.
Finishing touches: facet labels
Finally, as a courtesy to the reader, we’ll give the facet labels more transparent titles.
For this, we need to map the current default labels to more descriptive labels using
Then, we add these labels to the