I blog about statistics and research design with an audience consisting of researchers in bilingualism, multilingualism, and applied linguistics in mind.

Feed Subscribe to new blog posts.

Latest blog posts

In research, don't do things you don't see the point of

18 February 2022

When I started reading quantitative research reports, I hadn’t taken any methods or statistics classes, so small wonder that I didn’t understand why certain background variables on the participants were collected, why it was reported how many of them were women and how many of them were men, and what all those numbers in the results sections meant. However, I was willing to assume that these reports had been written by some fairly intelligent people and that, by the Gricean maxim of relevance, these bits and bobs must be relevant — why else report them?


An R function for computing Levenshtein distances between texts using the word as the unit of comparison

17 February 2022

For a new research project, we needed a way to tabulate the changes that were made to a text when correcting it. Since we couldn’t find a suitable tool, I wrote an R function that uses the Levenshtein algorithm to determine both the smallest number of words that need to be changed to transform one version of a text into another and what these changes are.


The consequences of controlling for a post-treatment variable

29 June 2021

Let’s say you want to find out if a pedagogical intervention boosts learners’ conversational skills in L2 French. You’ve learnt that including a well-chosen control variable in your analysis can work wonders in terms of statistical power and precision, so you decide to administer a French vocabulary test to your participants in order to include their score on this test in your analyses as a covariate. But if you administer the vocabulary test after the intervention, it’s possible that the vocabulary scores are themselves affected by the intervention as well. If this is indeed the case, you may end up doing more harm than good. In this blog post, I will take a closer look at four general cases where controlling for such a ‘post-treatment’ variable is harmful, and one case where it improves matters.


Quantitative methodology: An introduction

16 December 2020

I’ve taught my last class for the semester and I thought I’d make available the booklet that I wrote for teaching my class on quantitative methodology. You can download it here.

It contains seven reading assignments (mostly empirical studies that serve as examples) and ten chapters with lectures:

  1. Association and causality.
  2. Constructing a control group.
  3. Alternative explanations.
  4. Inferential statistics 101. (The course is not a statistics course, but there’s no avoiding talking about p-values given their omnipresence.)
  5. Increasing precision.
  6. Pedagogical interventions.
  7. Within-subjects experiments.
  8. Quasi-experiments and correlational studies.
  9. Constructs and indicators.
  10. Questionable research practices.

I’ve also included two appendices:

  • Reading difficult results sections.
  • Reporting research transparently.

Hopefully some of you find it useful, and feel free to let me know what you think.


Capitalising on covariates in cluster-randomised experiments

2 September 2020

In cluster-randomised experiments, participants are assigned to the conditions randomly but not on an individual basis. Instead, entire batches (‘clusters’) of participants are assigned in such a way that each participant in the same cluster is assigned to the same condition. A typical example would be an educational experiment in which all pupils in the same class get assigned to the same experimental condition. Crucially, the analysis should take into account the fact that the random assignment took place at the cluster level rather than at the individual level.

Also typically in educational experiments, researchers have some information about the participants’ performance before the intervention took place. This information can come in the form of a covariate, for instance the participants’ performance on a pretest or some self-assessment of their skills. Even in experiments that use random assignment, including such covariates in the analysis is useful as they help to reduce the error variance. Lots of different methods for including covariates in the analysis of cluster-randomised experiments are discussed in the literature, but I couldn’t find any discussion about the merits and drawbacks of these different methods.

In order to provide such discussion, I ran a series of simulations to compare 25 (!) different ways of including a covariate in the analysis of a cluster-randomised experiment in terms of their Type-I error and their power. The article outlining these simulations and the findings is available from PsyArXiv; the R code used for the simulations as well as its output is available from the Open Science Framework. In the remainder of this post, I’ll discuss how these simulations may be useful to you if you’re planning to run a cluster-randomised experiment.


Tutorial: Visualising statistical uncertainty using model-based graphs

29 June 2020

I wrote a tutorial about visualising the statistical uncertainty in statistical models for a conference that took place a couple of months ago, and I’ve just realised that I’ve never advertised this tutorial in this blog. You can find the tutorial here: Visualising statistical uncertainty using model-based graphs.


Interpreting regression models: a reading list

12 June 2020

Last semester I taught a class for PhD students and collaborators that focused on how the output of regression models is to be interpreted. Most participants had at least some experience with fitting regression models, but I had noticed that they were often unsure about the precise statistical interpretation of the output of these models (e.g., What does this parameter estimate of 1.2 correspond to in the data?). Moreover, they were usually a bit too eager to move from the model output to a subject-matter interpretation (e.g., What does this parameter estimate of 1.2 tell me about language learning?). I suspect that the same goes for many applied linguists, and social scientists more generally, so below I provide an overview of the course contents as well as the reading list.


Nonparametric tests aren't a silver bullet when parametric assumptions are violated

23 May 2020

Some researchers adhere to a simple strategy when comparing data from two or more groups: when they think that the data in the groups are normally distributed, they run a parametric test ($t$-test or ANOVA); when they suspect that the data are not normally distributed, they run a nonparametric test (e.g., Mann–Whitney or Kruskal–Wallis). Rather than follow such an automated approach to analysing data, I think researchers ought to consider the following points:

  • The $t$-test and ANOVA compare means; the Mann–Whitney and Kruskal–Wallis don’t.
  • The Mann–Whitney and Kruskal–Wallis do not in general compare medians, either. I’ll illustrate these first two points in this blog post.
  • The main problem with parametric tests when you have nonnormal data is that these tests compare means, but that these means don’t necessarily capture a relevant aspect of the data. But even if the data aren’t normally distributed, comparing means can sometimes be reasonable, depending on what the data look like and what it is you’re actually interested in. And if you do want to compare means, parametric tests or bootstrapping are more sensible than running a nonparametric test. See also my blog post Before worrying about model assumptions, think about model relevance.
  • If you want to compare medians, look into bootstrapping or quantile regression.
  • Above all, make sure that you know you’re comparing when you run a test and that this comparison makes sense in light of the data and your research question.

In this blog post, I’ll share the results of some simulations that demonstrate that the Mann–Whitney (a) picks up on differences in the variance between two distributions, even if they have the same mean and median; (b) picks up on differences in the median between two distributions, even if they have the same mean and variance; and (c) picks up on differences in the mean between two distributions, even if they have the same median and variance. These points aren’t new (see Zimmerman 1998), but since the automated strategy (‘parametric when normal, otherwise nonparemetric’) is pretty widespread, they bear repeating.


Tutorial: Obtaining directly interpretable regression coefficients by recoding categorical predictors

5 May 2020

The output of regression models is often difficult to parse, especially when categorical predictors and interactions between them are being modelled. The goal of this tutorial is to show you how you can obtain estimated coefficients that you can interpret directly in terms of your research question. I’ve learnt about this technique thanks to Schad et al. (2020), and I refer to them for a more detailed discussion. What I will do is go through three examples of increasing complexity that should enable you to apply the technique in your own analyses.


Baby steps in Bayes: Incorporating reliability estimates in regression models

18 February 2020

Researchers sometimes calculate reliability indices such as Cronbach’s $\alpha$ or Revelle’s $\omega_T$, but their statistical models rarely take these reliability indices into account. Here I want to show you how you can incorporate information about the reliability about your measurements in a statistical model so as to obtain more honest and more readily interpretable parameter estimates.