A blog about statistics and research design geared towards researchers in bilingualism, multilingualism, and applied linguistics.
Latest blog posts
18 August 2016
R sorts the levels of a factor alphabetically.
When drawing graphs, this results in ‘Alabama First’ graphs,
and it’s usually better to sort the elements of a graph by more meaningful principles than alphabetical order.
This post illustrates three convenience functions you can use to sort factor levels in
according to another covariate, their frequency of occurrence, or manually.
5 July 2016
I’d promised to write another installment on drawing graphs, but instead I’m going to write about something that I had to exclude, for reasons of space, from a recently published book chapter on age effects in second language (L2) acquisition: classifying observations (e.g., L2 learners) and estimating error rates.
I’m going to illustrate the usefulness of classification algorithms for addressing some problems in L2 acquisition research, but my broader aim is to show that there’s more to statistics than running significance tests and to encourage you to explore—even if superficially—what else is out there.
13 June 2016
Graphs are incredibly useful both for understanding your own data and for communicating your insights to your audience. This is why the next few blog posts will consist of tutorials on how to draw four kinds of graphs that I find most useful: scatterplots, line charts, boxplots and some variations, and Cleveland dotplots. These tutorials are aimed primarily at the students in our MA programme. Today’s graph: the line chart.
2 June 2016
Graphs are incredibly useful both for understanding your own data and for communicating your insights to your audience. This is why the next few blog posts will consist of tutorials on how to draw four kinds of graphs that I find most useful: scatterplots, linecharts, boxplots and some variations, and Cleveland dotplots. These tutorials are aimed primarily at the students in our MA programme. Today’s graph: the scatterplot.
18 May 2016
At a workshop last week, we were asked to bring along examples of good and bad academic writing. Several of the bad examples were papers where the number of significance tests was so large that the workshop participants felt that they couldn’t make sense of the Results section. It’s not that they didn’t understand each test separately but rather that they couldn’t see the forest for the trees. I, too, wish researchers would stop inundating their readers with t, F and p-values (especially in the running text), but until such time readers need to learn how to survive the ANOVA onslaught. Below I present a list of guidelines to help them with that.
22 April 2016
After reading a couple of papers whose conclusions were heavily based on R² (“variance explained”) values, I thought I’d summarise why I’m often skeptical of such conclusions. The reason, in a nutshell, is that reported R² values tend to overestimate how much of the variance in the outcome variable the model can actually “explain”. To dyed-in-the-wool quantitative researchers, none of this blog post will be new, but I hope that it will make some readers think twice before focusing heavily on R² values.
1 April 2016
Daniël Lakens recently blogged about a topic that crops up every now and then: Do you need to correct your p-values when you’ve run several significance tests? The blog post is worth a read, and I feel this quote sums it up well:
We … need to make a statement about which tests relate to a single theoretical inference, which depends on the theoretical question. I believe many of the problems researchers have in deciding how to correct for multiple comparisons is actually a problem in deciding what their theoretical question is.
I think so, too, which is why in this blog post, I present five scenarios and discuss how I feel about correcting for multiple comparisons in each of them.
23 February 2016
We are often more interested in interaction effects than in main effects. In a given study, we may not so much be interested in whether high-proficiency second-language (L2) learners react more quickly to target-language stimuli than low-proficiency L2 learners nor in whether L2 learners react more quickly to L1–L2 cognates than to non-cognates. Rather, what we may be interested in is whether the latency difference on cognates and non-cognates differs between high- and low-proficiency learners. When running an ANOVA of the two-way interaction, we should include the main effects, too, and our software package will dutifully report the F-tests for these main effects (i.e., for proficiency and cognacy).
But if it is the interaction that is of specific interest, I do not think that we have to actually care about the significance of the main effects – or that we have to clutter the text by reporting them. Similarly, if a three-way interaction is what is of actual interest, the significance tests for the three main effects and two two-way interactions are not directly relevant to the research question, and the five corresponding F-tests can be omitted.
16 February 2016
When analysing experiments in which intact groups (clusters) were assigned to the experimental conditions, t-tests on cluster means that weight these means for cluster size are occasionally used. In fact, I too endorsed this approach as a straightforward and easily implemented way to account for clustering. It seems, however, that these weighted t-test are still anti-conservative, i.e. they find too many significant differences when there is in fact no effect. In this post, I present simulation results to illustrate this and I also correct another published error of mine.