I blog about statistics and research design with an audience consisting of researchers in bilingualism, multilingualism, and applied linguistics in mind.

Feed Subscribe to new blog posts.

Latest blog posts

Tutorial: Adding confidence bands to effect displays

12 May 2017

In the previous blog post, I demonstrated how you can draw effect displays to render regression models more intelligible to yourself and to your audience. These effect displays did not contain information about the uncertainty inherent to estimating regression models, however. To that end, this blog post demonstrates how you can add confidence bands to effect displays for multiple regression, logistic regression, and logistic mixed-effects models, and explains how these confidence bands are constructed.


Tutorial: Plotting regression models

23 April 2017

The results of regression models, particularly fairly complex ones, can be difficult to appreciate and hard to communicate to an audience. One useful technique is to plot the effect of each predictor variable on the outcome while holding constant any other predictor variables. Fox (2003) discusses how such effect displays are constructed and provides an implementation in the effects package for R.

Since I think it’s both instructive to see how effect displays are constructed from the ground up and useful to be able to tweak them yourself in R, this blog post illustrates how to draw such plots for three increasingly complex statistical models: ordinary multiple regression, logistic regression, and mixed-effects logistic regression. The goal in each of these three examples is to visualise the effects of the predictor variables without factoring in the uncertainty about these effects; visualising such uncertainty will be the topic of a future blog post.


Confidence intervals for standardised mean differences

22 February 2017

Standardised effect sizes express patterns found in the data in terms of the variability found in the data. For instance, a mean difference in body height could be expressed in the metric in which the data were measured (e.g., a difference of 4 centimetres) or relative to the variation in the data (e.g., a difference of 0.9 standard deviations). The latter is a standardised effect size known as Cohen’s d.

As I’ve written before, I don’t particularly like standardised effect sizes. Nonetheless, I wondered how confidence intervals around standardised effect sizes (more specifically: standardised mean differences) are constructed. Until recently, I hadn’t really thought about it and sort of assumed you would compute them the same way as confidence intervals around raw effect sizes. But unlike raw (unstandardised) mean differences, standardised mean differences are a combination of two estimates subject to sampling error: the mean difference itself and the sample standard deviation. Moreover, the sample standard deviation is a biased estimate of the population standard deviation (it tends to be too low), which causes Cohen’s d to be an upwardly biased estimate of the population standardised mean difference. Surely both of these factors must affect how the confidence intervals around standardised effect sizes are constructed?

It turns out that indeed they do. When I compared the confidence intervals that I computed around a standardised effect size using a naive approach that assumed that the standard deviation wasn’t subject to sampling error and wasn’t biased, I got different results than when I used specialised R functions.

But these R functions all produced different results, too.

Obviously, there may well be more than one way to skin a cat, but this caused me to wonder if the different procedures for computing confidence intervals all covered the true population parameter with the nominal probability (e.g., in 95% of cases for a 95% confidence interval). I ran a simulation to find out, which I’ll report in the remainder of this post. If you spot any mistakes, please let me know.


Which predictor is most important? Predictive utility vs. construct importance

15 February 2017

Every so often, I’m asked for my two cents on a correlational study in which the researcher wants to find out which of a set of predictor variables is the most important one. For instance, they may have the results of an intelligence test, of a working memory task and of a questionnaire probing their participants’ motivation for learning French, and they want to find out which of these three is the most important factor in acquiring a nativelike French accent, as measured using a pronunciation task. As I will explain below, research questions such as these can be interpreted in two ways, and whether they can be answered sensibly depends on the interpretation intended.


Automatise repetitive tasks

31 January 2017

Research often involves many repetitive tasks. For a ongoing project, for instance, we needed to replace all stylised apostrophes (’) with straight apostrophes (‘) in some 3,000 text files when preparing the texts for the next step. As another example, you may need to split up a bunch of files into different directories depending on, say, the character in the file name just before the extension. When done by hand, such tasks are as mind-numbing and time-consuming as they sound – perhaps you would do them on a Friday afternoon while listening to music or outsource them to a student assistant. My advice, though, is this: Try to automatise repetitive tasks.

Doing repetitive tasks is what computers are for, so rather than spending several hours learning nothing, I suggest you spend that time writing a script or putting together a command line call that does the task for you. If you have little experience doing this, this will take time at first. In fact, I reckon I often spend roughly same amount of time trying to automatise menial tasks as it would have cost me to do them by hand. But in the not-so-long run, automatisation is a time-saver: Once you have a working script, you can tweak and reuse it. Additionally, while you’re figuring out how to automatise a menial chore, you’re actually learning something useful. The chores become more of a challenge and less mind-numbing. I’m going to present an example or two of what I mean and I will conclude by giving some general pointers.


Some illustrations of bootstrapping

20 December 2016

This post illustrates a statistical technique that becomes particularly useful when you want to calculate the sampling variation of some custom statistic when you start to dabble in mixed-effects models. This technique is called bootstrapping and I will first illustrate its use in constructing confidence intervals around a custom summary statistic. Then I’ll illustrate three bootstrapping approaches when constructing confidence intervals around a regression coefficient, and finally, I will show how bootstrapping can be used to compute p-values.

The goal of this post is not to argue that bootstrapping is superior to the traditional alternatives—in the examples discussed, they are pretty much on par—but merely to illustrate how it works. The main advantage of bootstrapping, as I understand it, is that it can be applied in situation where the traditional alternatives are not available, where you don’t understand how to use them or where their assumptions are questionable, but I think it’s instructive to see how its results compare to those of traditional approaches where both can readily be applied.


What data patterns can lie behind a correlation coefficient?

21 November 2016

In this post, I want to, first, help you to improve your intuition of what data patterns correlation coefficients can represent and, second, hammer home the point that to sensibly interpret a correlation coefficient, you need the corresponding scatterplot.


Common-language effect sizes

16 November 2016

The goal of this blog post is to share with you a simple R function that may help you to better communicate the extent to which two groups differ and overlap by computing common-language effect sizes.


The Centre for Open Science's Preregistration Challenge: Why it's relevant and some recommended background reading

31 October 2016

This blog post is an edited version of a mail I sent round to my colleagues at the various language and linguistics departments in Fribourg. Nothing in this post is new per se, but I haven’t seen much discussion of these issues among linguists, applied linguists and bilingualism researchers.

I’d like to point you to an initiative of the Center for Open Science: the $1,000,000 Preregistration Challenge. The basic idea is to foster research transparency by offering a monetary reward to researchers who’ve outlined their study design and planned analyses in advance and report the results of these analyses in the report.

I’m not affiliated with this organisation, but I do think both it and its initiative are important developments. For those interested in knowing why I think so, I’ve written a brief text below that includes links to more detailed articles or examples; if you prefer reference lists, there’s one of those down below. Most of articles were written by and for psychologists, but I reckon pretty much all of it applies equally to research in linguistics and language learning.


Tutorial: Drawing a dot plot

30 August 2016

In the fourth tutorial on drawing useful plots with ggplot2, we’re taking a closer look at dot plots – a useful and more flexible alternative to bar and pie charts.