A blog about statistics and research design geared towards researchers in bilingualism, multilingualism, and applied linguistics.

Feed Subscribe to new blog posts.

Feed Subscribe to new academic publications.

Latest blog posts


R tip: Ordering factor levels more easily

18 August 2016

By default, R sorts the levels of a factor alphabetically. When drawing graphs, this results in ‘Alabama First’ graphs, and it’s usually better to sort the elements of a graph by more meaningful principles than alphabetical order. This post illustrates three convenience functions you can use to sort factor levels in R according to another covariate, their frequency of occurrence, or manually.

Read more...


Classifying second-language learners as native- or non-nativelike: Don't neglect classification error rates

5 July 2016

I’d promised to write another installment on drawing graphs, but instead I’m going to write about something that I had to exclude, for reasons of space, from a recently published book chapter on age effects in second language (L2) acquisition: classifying observations (e.g., L2 learners) and estimating error rates.

I’m going to illustrate the usefulness of classification algorithms for addressing some problems in L2 acquisition research, but my broader aim is to show that there’s more to statistics than running significance tests and to encourage you to explore—even if superficially—what else is out there.

Read more...


Tutorial: Drawing a boxplot

21 June 2016

In the two previous blog posts, you learnt to draw simple but informative scatterplots and line charts. This time, you’ll learn how to draw boxplots.

Read more...


Tutorial: Drawing a line chart

13 June 2016

Graphs are incredibly useful both for understanding your own data and for communicating your insights to your audience. This is why the next few blog posts will consist of tutorials on how to draw four kinds of graphs that I find most useful: scatterplots, line charts, boxplots and some variations, and Cleveland dotplots. These tutorials are aimed primarily at the students in our MA programme. Today’s graph: the line chart.

Read more...


Tutorial: Drawing a scatterplot

2 June 2016

Graphs are incredibly useful both for understanding your own data and for communicating your insights to your audience. This is why the next few blog posts will consist of tutorials on how to draw four kinds of graphs that I find most useful: scatterplots, linecharts, boxplots and some variations, and Cleveland dotplots. These tutorials are aimed primarily at the students in our MA programme. Today’s graph: the scatterplot.

Read more...


Surviving the ANOVA onslaught

18 May 2016

At a workshop last week, we were asked to bring along examples of good and bad academic writing. Several of the bad examples were papers where the number of significance tests was so large that the workshop participants felt that they couldn’t make sense of the Results section. It’s not that they didn’t understand each test separately but rather that they couldn’t see the forest for the trees. I, too, wish researchers would stop inundating their readers with t, F and p-values (especially in the running text), but until such time readers need to learn how to survive the ANOVA onslaught. Below I present a list of guidelines to help them with that.

Read more...


Why reported R² values are often too high

22 April 2016

After reading a couple of papers whose conclusions were heavily based on (“variance explained”) values, I thought I’d summarise why I’m often skeptical of such conclusions. The reason, in a nutshell, is that reported values tend to overestimate how much of the variance in the outcome variable the model can actually “explain”. To dyed-in-the-wool quantitative researchers, none of this blog post will be new, but I hope that it will make some readers think twice before focusing heavily on values.

Read more...


On correcting for multiple comparisons: Five scenarios

1 April 2016

Daniël Lakens recently blogged about a topic that crops up every now and then: Do you need to correct your p-values when you’ve run several significance tests? The blog post is worth a read, and I feel this quote sums it up well:

We … need to make a statement about which tests relate to a single theoretical inference, which depends on the theoretical question. I believe many of the problems researchers have in deciding how to correct for multiple comparisons is actually a problem in deciding what their theoretical question is.

I think so, too, which is why in this blog post, I present five scenarios and discuss how I feel about correcting for multiple comparisons in each of them.

Read more...


Silly significance tests: The main effects no one is interested in

23 February 2016

We are often more interested in interaction effects than in main effects. In a given study, we may not so much be interested in whether high-proficiency second-language (L2) learners react more quickly to target-language stimuli than low-proficiency L2 learners nor in whether L2 learners react more quickly to L1–L2 cognates than to non-cognates. Rather, what we may be interested in is whether the latency difference on cognates and non-cognates differs between high- and low-proficiency learners. When running an ANOVA of the two-way interaction, we should include the main effects, too, and our software package will dutifully report the F-tests for these main effects (i.e., for proficiency and cognacy).

But if it is the interaction that is of specific interest, I do not think that we have to actually care about the significance of the main effects – or that we have to clutter the text by reporting them. Similarly, if a three-way interaction is what is of actual interest, the significance tests for the three main effects and two two-way interactions are not directly relevant to the research question, and the five corresponding F-tests can be omitted.

Read more...


Experiments with intact groups: spurious significance with improperly weighted t-tests

16 February 2016

When analysing experiments in which intact groups (clusters) were assigned to the experimental conditions, t-tests on cluster means that weight these means for cluster size are occasionally used. In fact, I too endorsed this approach as a straightforward and easily implemented way to account for clustering. It seems, however, that these weighted t-test are still anti-conservative, i.e. they find too many significant differences when there is in fact no effect. In this post, I present simulation results to illustrate this and I also correct another published error of mine.

Read more...