# Blog

A blog about statistics and research design geared towards researchers in bilingualism, multilingualism, and applied linguistics.

Subscribe to new academic publications.

## Latest blog posts

### R tip: Ordering factor levels more easily

18 August 2016

By default, `R`

sorts the levels of a factor alphabetically.
When drawing graphs, this results in ‘Alabama First’ graphs,
and it’s usually better to sort the elements of a graph by more meaningful principles than alphabetical order.
This post illustrates three convenience functions you can use to sort factor levels in `R`

according to another covariate, their frequency of occurrence, or manually.

### Classifying second-language learners as native- or non-nativelike: Don't neglect classification error rates

5 July 2016

I’d promised to write another installment on drawing graphs,
but instead I’m going to write about something that
I had to exclude, for reasons of space,
from a recently published book chapter
on age effects in second language (L2) acquisition:
**classifying** observations (e.g., L2 learners) and **estimating error rates**.

I’m going to illustrate the usefulness of classification algorithms for addressing some problems in L2 acquisition research, but my broader aim is to show that there’s more to statistics than running significance tests and to encourage you to explore—even if superficially—what else is out there.

### Tutorial: Drawing a boxplot

21 June 2016

In the two previous blog posts,
you learnt to draw simple but informative **scatterplots**
and **line charts**.
This time, you’ll learn how to draw **boxplots**.

### Tutorial: Drawing a line chart

13 June 2016

Graphs are incredibly useful
both for understanding your own data
and for communicating your insights to your audience.
This is why the next few blog posts
will consist of tutorials on how to draw
four kinds of graphs that I find most useful:
**scatterplots**, **line charts**,
**boxplots** and some variations, and **Cleveland dotplots**.
These tutorials are aimed primarily at the students in our MA programme.
Today’s graph: the line chart.

### Tutorial: Drawing a scatterplot

2 June 2016

Graphs are incredibly useful
both for understanding your own data
and for communicating your insights to your audience.
This is why the next few blog posts
will consist of tutorials on how to draw
four kinds of graphs that I find most useful:
**scatterplots**, **linecharts**,
**boxplots** and some variations, and **Cleveland dotplots**.
These tutorials are aimed primarily at the students in our MA programme.
Today’s graph: the scatterplot.

### Surviving the ANOVA onslaught

18 May 2016

At a workshop last week, we were asked to bring along examples of good and bad academic writing.
Several of the bad examples were papers where the number of significance tests was so large
that the workshop participants felt that they couldn’t make sense of the Results section.
It’s not that they didn’t understand each test separately
but rather that they couldn’t see the forest for the trees.
I, too, wish researchers would stop inundating their readers with *t*, *F* and *p*-values (especially in the running text),
but until such time readers need to learn how to survive the ANOVA onslaught.
Below I present a list of guidelines to help them with that.

### Why reported R² values are often too high

22 April 2016

After reading a couple of papers whose conclusions were heavily based on *R²* (“variance explained”) values,
I thought I’d summarise why I’m often skeptical of such conclusions.
The reason, in a nutshell, is that reported *R²* values tend to overestimate how much of the variance in the outcome variable the model can actually “explain”.
To dyed-in-the-wool quantitative researchers,
none of this blog post will be new,
but I hope that it will make some readers think twice before focusing heavily on *R²* values.

### On correcting for multiple comparisons: Five scenarios

1 April 2016

Daniël Lakens recently blogged about a topic that crops up every now and then:
Do you need to correct your *p*-values when you’ve run several significance tests?
The blog post is worth a read,
and I feel this quote sums it up well:

We … need to make a statement about which tests relate to a single theoretical inference, which depends on the theoretical question. I believe many of the problems researchers have in deciding how to correct for multiple comparisons is actually a problem in deciding what their theoretical question is.

I think so, too, which is why in this blog post, I present five scenarios and discuss how I feel about correcting for multiple comparisons in each of them.

### Silly significance tests: The main effects no one is interested in

23 February 2016

We are often more interested in interaction effects than in main effects. In a given study, we may not so much be interested in whether high-proficiency second-language (L2) learners react more quickly to target-language stimuli than low-proficiency L2 learners nor in whether L2 learners react more quickly to L1–L2 cognates than to non-cognates. Rather, what we may be interested in is whether the latency difference on cognates and non-cognates differs between high- and low-proficiency learners. When running an ANOVA of the two-way interaction, we should include the main effects, too, and our software package will dutifully report the F-tests for these main effects (i.e., for proficiency and cognacy).

But if it is the interaction that is of specific interest, I do not think that we have to actually *care* about the significance of the main effects – or that we have to clutter the text by reporting them. Similarly, if a three-way interaction is what is of actual interest, the significance tests for the three main effects and two two-way interactions are not directly relevant to the research question, and the five corresponding F-tests can be omitted.

### Experiments with intact groups: spurious significance with improperly weighted t-tests

16 February 2016

When analysing experiments in which intact groups (clusters) were assigned to the experimental conditions,
*t*-tests on cluster means that weight these means for cluster size are occasionally used.
In fact, I too endorsed this approach as a straightforward and easily implemented way to account for clustering.
It seems, however, that these weighted *t*-test are still anti-conservative, i.e. they find too many significant differences when there is in fact no effect.
In this post, I present simulation results to illustrate this and I also correct another published error of mine.