# Blog

I blog about statistics and research design with an audience consisting of researchers in bilingualism, multilingualism, and applied linguistics in mind.

## Latest blog posts

### Some illustrations of bootstrapping

20 December 2016

This post illustrates a statistical technique
that becomes particularly useful when you want to calculate the sampling variation of some custom statistic
when you start to dabble in mixed-effects models.
This technique is called **bootstrapping**
and I will first illustrate its use in constructing confidence intervals around
a custom summary statistic.
Then I’ll illustrate three bootstrapping approaches when constructing
confidence intervals around a regression coefficient,
and finally, I will show how bootstrapping can be used to compute *p*-values.

The goal of this post is *not* to argue that bootstrapping is superior to the traditional alternatives—in the examples
discussed, they are pretty much on par—but merely to illustrate how it works.
The main advantage of bootstrapping, as I understand it,
is that it can be applied in situation where the traditional alternatives
are not available,
where you don’t understand how to use them
or where their assumptions are questionable,
but I think it’s instructive to see how its results compare to those of traditional approaches where both can readily be applied.

### What data patterns can lie behind a correlation coefficient?

21 November 2016

In this post, I want to, first, help you to improve your intuition of what data patterns correlation coefficients can represent and, second, hammer home the point that to sensibly interpret a correlation coefficient, you need the corresponding scatterplot.

### Common-language effect sizes

16 November 2016

The goal of this blog post is to share with you a simple `R`

function
that may help you to better communicate the extent to which two groups differ and overlap
by computing *common-language effect sizes*.

### The Centre for Open Science's Preregistration Challenge: Why it's relevant and some recommended background reading

31 October 2016

*This blog post is an edited version of a mail I sent round to my colleagues at the various language and linguistics departments in Fribourg. Nothing in this post is new per se, but I haven’t seen much discussion of these issues among linguists, applied linguists and bilingualism researchers.*

I’d like to point you to an initiative of the Center for Open Science: the $1,000,000 Preregistration Challenge. The basic idea is to foster research transparency by offering a monetary reward to researchers who’ve outlined their study design and planned analyses in advance and report the results of these analyses in the report.

I’m not affiliated with this organisation, but I do think both it and its initiative are important developments. For those interested in knowing why I think so, I’ve written a brief text below that includes links to more detailed articles or examples; if you prefer reference lists, there’s one of those down below. Most of articles were written by and for psychologists, but I reckon pretty much all of it applies equally to research in linguistics and language learning.

### Tutorial: Drawing a dot plot

30 August 2016

In the fourth tutorial on drawing useful plots with `ggplot2`

, we’re taking a closer look at **dot plots** – a useful and more flexible alternative to bar and pie charts.

### R tip: Ordering factor levels more easily

18 August 2016

By default, `R`

sorts the levels of a factor alphabetically.
When drawing graphs, this results in ‘Alabama First’ graphs,
and it’s usually better to sort the elements of a graph by more meaningful principles than alphabetical order.
This post illustrates three convenience functions you can use to sort factor levels in `R`

according to another covariate, their frequency of occurrence, or manually.

### Classifying second-language learners as native- or non-nativelike: Don't neglect classification error rates

5 July 2016

I’d promised to write another installment on drawing graphs,
but instead I’m going to write about something that
I had to exclude, for reasons of space,
from a recently published book chapter
on age effects in second language (L2) acquisition:
**classifying** observations (e.g., L2 learners) and **estimating error rates**.

I’m going to illustrate the usefulness of classification algorithms for addressing some problems in L2 acquisition research, but my broader aim is to show that there’s more to statistics than running significance tests and to encourage you to explore—even if superficially—what else is out there.

### Tutorial: Drawing a boxplot

21 June 2016

In the two previous blog posts,
you learnt to draw simple but informative **scatterplots**
and **line charts**.
This time, you’ll learn how to draw **boxplots**.

### Tutorial: Drawing a line chart

13 June 2016

Graphs are incredibly useful
both for understanding your own data
and for communicating your insights to your audience.
This is why the next few blog posts
will consist of tutorials on how to draw
four kinds of graphs that I find most useful:
**scatterplots**, **line charts**,
**boxplots** and some variations, and **Cleveland dotplots**.
These tutorials are aimed primarily at the students in our MA programme.
Today’s graph: the line chart.

### Tutorial: Drawing a scatterplot

2 June 2016

Graphs are incredibly useful
both for understanding your own data
and for communicating your insights to your audience.
This is why the next few blog posts
will consist of tutorials on how to draw
four kinds of graphs that I find most useful:
**scatterplots**, **linecharts**,
**boxplots** and some variations, and **Cleveland dotplots**.
These tutorials are aimed primarily at the students in our MA programme.
Today’s graph: the scatterplot.