R tip: Ordering factor levels more easily

R
graphics
Author

Jan Vanhove

Published

August 18, 2016

By default, `R` sorts the levels of a factor alphabetically. When drawing graphs, this results in ‘Alabama First’ graphs, and it’s usually better to sort the elements of a graph by more meaningful principles than alphabetical order. This post illustrates three convenience functions you can use to sort factor levels in `R` according to another covariate, their frequency of occurrence, or manually.

Update (2023-08-08): The `reorder()` function has pretty much the same functionality as the functions introduced in this blog post. In the original blog post, I loaded some packages that are now part of the `tidyverse` suite separately; now I just use the `tidyverse`.

First you’ll need the `tidyverse`:

``install.packages(c("tidyverse"))``

You can download the convenience functions from my Github page or read them in directly into `R`:

``source("https://janhove.github.io/RCode/sortLvls.R")``

Sorting factor levels by another variable

The code below creates an example dataset with a factor and a covariate:

``````# Load packages
library(tidyverse)

# Generate same data
set.seed(18-08-2016)

# Create data frame
df <- data.frame(factorBefore = factor(rep(letters[1:5], 3)),
covariate = rnorm(15, 50, 30))

# Current order of factor levels (alphabetically)
levels(df\$factorBefore)``````
``[1] "a" "b" "c" "d" "e"``
``````# Covariate mean per factor level
df |> group_by(factorBefore) |> summarise(mean(covariate))``````
``````# A tibble: 5 × 2
factorBefore `mean(covariate)`
<fct>                    <dbl>
1 a                         8.04
2 b                        36.9
3 c                        45.0
4 d                        71.6
5 e                        41.9 ``````

What we want is to sort the levels of the factor by the covariate mean per factor level (i.e., a-b-e-c-d). The function `sortLvlsByVar.fnc` accomplishes this:

``````# Reorder
df\$factorAfter1 <- sortLvlsByVar.fnc(df\$factorBefore, df\$covariate)``````
``Loading required package: magrittr``
``````
Attaching package: 'magrittr'``````
``````The following object is masked from 'package:purrr':

set_names``````
``````The following object is masked from 'package:tidyr':

extract``````
``````# New order of factor levels
levels(df\$factorAfter1)``````
``[1] "a" "b" "e" "c" "d"``

By setting the `ascending` parameter to `FALSE`, the factor levels are sorting descendingly according to the covariate mean:

``````# Reorder descendingly
df\$factorAfter2 <- sortLvlsByVar.fnc(df\$factorBefore, df\$covariate, ascending = FALSE)
levels(df\$factorAfter2)``````
``[1] "d" "c" "e" "b" "a"``

How this looks like when graphed:

``````# Alphabetical order
p1 <- ggplot(df, aes(x = factorBefore, y = covariate)) +
geom_boxplot()
# Sorted ascendingly
p2 <- ggplot(df, aes(x = factorAfter1, y = covariate)) +
geom_boxplot()
# Sorted descendingly
p3 <- ggplot(df, aes(x = factorAfter2, y = covariate)) +
geom_boxplot()
gridExtra::grid.arrange(p1, p2, p3, ncol = 3)``````

You can change the `R` code from the Github page so that the levels are sorted by another summary statistics, e.g., the covariate median per factor level.

Sorting factor levels by their frequency of occurrence

Again we’ll first create some data:

``````df2 <- data.frame(factorBefore = factor(rep(letters[1:5], c(7, 3, 80, 15, 107))),
covariate = rnorm(sum(c(7, 3, 80, 15, 107)), 50, 30))
table(df2\$factorBefore)``````
``````
a   b   c   d   e
7   3  80  15 107 ``````

We want to order these factor levels by their frequency of occurrence in the dataset (i.e., b-a-d-c-e). `sortLvlsByN.fnc()` accomplishes this:

``````df2\$factorAfter1 <- sortLvlsByN.fnc(df2\$factorBefore)
table(df2\$factorAfter1)``````
``````
b   a   d   c   e
3   7  15  80 107 ``````

Or descendingly:

``````df2\$factorAfter2 <- sortLvlsByN.fnc(df2\$factorBefore, ascending = FALSE)
table(df2\$factorAfter2)``````
``````
e   c   d   a   b
107  80  15   7   3 ``````

When plotted:

``````p4 <- ggplot(df2, aes(x = factorBefore, y = covariate)) +
geom_boxplot(varwidth = TRUE)
p5 <- ggplot(df2, aes(x = factorAfter1, y = covariate)) +
geom_boxplot(varwidth = TRUE)
p6 <- ggplot(df2, aes(x = factorAfter2, y = covariate)) +
geom_boxplot(varwidth = TRUE)
gridExtra::grid.arrange(p4, p5, p6, ncol = 3)``````

Customising the order of factor levels

If you want to put the factor levels in a custom order, you can use the `sortLvls.fnc()` function.

``````# Create data
df3 <- data.frame(factorBefore = factor(rep(letters[1:5], 3)),
covariate = rnorm(15, 50, 30))
levels(df3\$factorBefore)``````
``[1] "a" "b" "c" "d" "e"``

Let’s say we, for some reason, want to put the current 5th level (e) first, the current 3rd level (c) second, the 4th 3rd, the 4th 2nd and the 1st last:

``````df3\$factorAfter1 <- sortLvls.fnc(df3\$factorBefore, c(5, 3, 4, 2, 1))
levels(df3\$factorAfter1)``````
``[1] "e" "c" "d" "b" "a"``

You can also just specify which factor levels need to go up front; the order of the other ones stays the same:

``````# Put the current 3rd and 2nd in front; leave the rest as they were:
df3\$factorAfter2 <- sortLvls.fnc(df3\$factorBefore, c(3, 2))
levels(df3\$factorAfter2)``````
``[1] "c" "b" "a" "d" "e"``

Software versions

``devtools::session_info()``
``````─ Session info ───────────────────────────────────────────────────────────────
setting  value
version  R version 4.3.1 (2023-06-16)
os       Ubuntu 22.04.2 LTS
system   x86_64, linux-gnu
ui       X11
language en_US
collate  en_US.UTF-8
ctype    en_US.UTF-8
tz       Europe/Zurich
date     2023-08-08
pandoc   3.1.1 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
package     * version date (UTC) lib source
cachem        1.0.6   2021-08-19 [2] CRAN (R 4.2.0)
callr         3.7.3   2022-11-02 [1] CRAN (R 4.3.1)
cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
colorspace    2.1-0   2023-01-23 [1] CRAN (R 4.3.0)
crayon        1.5.2   2022-09-29 [1] CRAN (R 4.3.1)
devtools      2.4.5   2022-10-11 [1] CRAN (R 4.3.1)
digest        0.6.29  2021-12-01 [2] CRAN (R 4.2.0)
dplyr       * 1.1.2   2023-04-20 [1] CRAN (R 4.3.0)
ellipsis      0.3.2   2021-04-29 [2] CRAN (R 4.2.0)
evaluate      0.15    2022-02-18 [2] CRAN (R 4.2.0)
fansi         1.0.4   2023-01-22 [1] CRAN (R 4.3.1)
farver        2.1.1   2022-07-06 [1] CRAN (R 4.3.0)
fastmap       1.1.0   2021-01-25 [2] CRAN (R 4.2.0)
forcats     * 1.0.0   2023-01-29 [1] CRAN (R 4.3.0)
fs            1.5.2   2021-12-08 [2] CRAN (R 4.2.0)
generics      0.1.3   2022-07-05 [1] CRAN (R 4.3.0)
ggplot2     * 3.4.2   2023-04-03 [1] CRAN (R 4.3.0)
glue          1.6.2   2022-02-24 [2] CRAN (R 4.2.0)
gridExtra     2.3     2017-09-09 [1] CRAN (R 4.3.0)
gtable        0.3.3   2023-03-21 [1] CRAN (R 4.3.0)
hms           1.1.3   2023-03-21 [1] CRAN (R 4.3.0)
htmltools     0.5.5   2023-03-23 [1] CRAN (R 4.3.0)
htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.3.1)
httpuv        1.6.11  2023-05-11 [1] CRAN (R 4.3.1)
jsonlite      1.8.7   2023-06-29 [1] CRAN (R 4.3.1)
knitr         1.39    2022-04-26 [2] CRAN (R 4.2.0)
labeling      0.4.2   2020-10-20 [1] CRAN (R 4.3.0)
later         1.3.1   2023-05-02 [1] CRAN (R 4.3.1)
lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.3.0)
lubridate   * 1.9.2   2023-02-10 [1] CRAN (R 4.3.0)
magrittr    * 2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
memoise       2.0.1   2021-11-26 [2] CRAN (R 4.2.0)
mime          0.10    2021-02-13 [2] CRAN (R 4.0.2)
miniUI        0.1.1.1 2018-05-18 [1] CRAN (R 4.3.1)
munsell       0.5.0   2018-06-12 [1] CRAN (R 4.3.0)
pillar        1.9.0   2023-03-22 [1] CRAN (R 4.3.0)
pkgbuild      1.4.2   2023-06-26 [1] CRAN (R 4.3.1)
pkgconfig     2.0.3   2019-09-22 [2] CRAN (R 4.2.0)
pkgload       1.3.2.1 2023-07-08 [1] CRAN (R 4.3.1)
prettyunits   1.1.1   2020-01-24 [2] CRAN (R 4.2.0)
processx      3.8.2   2023-06-30 [1] CRAN (R 4.3.1)
profvis       0.3.8   2023-05-02 [1] CRAN (R 4.3.1)
promises      1.2.0.1 2021-02-11 [1] CRAN (R 4.3.1)
ps            1.7.5   2023-04-18 [1] CRAN (R 4.3.1)
purrr       * 1.0.1   2023-01-10 [1] CRAN (R 4.3.0)
R6            2.5.1   2021-08-19 [2] CRAN (R 4.2.0)
Rcpp          1.0.11  2023-07-06 [1] CRAN (R 4.3.1)
readr       * 2.1.4   2023-02-10 [1] CRAN (R 4.3.0)
remotes       2.4.2   2021-11-30 [2] CRAN (R 4.2.0)
rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
rmarkdown     2.21    2023-03-26 [1] CRAN (R 4.3.0)
rstudioapi    0.14    2022-08-22 [1] CRAN (R 4.3.0)
scales        1.2.1   2022-08-20 [1] CRAN (R 4.3.0)
sessioninfo   1.2.2   2021-12-06 [2] CRAN (R 4.2.0)
shiny         1.7.4.1 2023-07-06 [1] CRAN (R 4.3.1)
stringi       1.7.12  2023-01-11 [1] CRAN (R 4.3.1)
stringr     * 1.5.0   2022-12-02 [1] CRAN (R 4.3.0)
tibble      * 3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
tidyr       * 1.3.0   2023-01-24 [1] CRAN (R 4.3.0)
tidyselect    1.2.0   2022-10-10 [1] CRAN (R 4.3.0)
tidyverse   * 2.0.0   2023-02-22 [1] CRAN (R 4.3.1)
timechange    0.2.0   2023-01-11 [1] CRAN (R 4.3.0)
tzdb          0.4.0   2023-05-12 [1] CRAN (R 4.3.0)
urlchecker    1.0.1   2021-11-30 [1] CRAN (R 4.3.1)
usethis       2.2.2   2023-07-06 [1] CRAN (R 4.3.1)
utf8          1.2.3   2023-01-31 [1] CRAN (R 4.3.1)
vctrs         0.6.3   2023-06-14 [1] CRAN (R 4.3.0)
withr         2.5.0   2022-03-03 [2] CRAN (R 4.2.0)
xfun          0.39    2023-04-20 [1] CRAN (R 4.3.0)
xtable        1.8-4   2019-04-21 [1] CRAN (R 4.3.1)
yaml          2.3.5   2022-02-21 [2] CRAN (R 4.2.0)

[1] /home/jan/R/x86_64-pc-linux-gnu-library/4.3
[2] /usr/local/lib/R/site-library
[3] /usr/lib/R/site-library
[4] /usr/lib/R/library

──────────────────────────────────────────────────────────────────────────────``````