By default, R sorts the levels of a factor alphabetically.
When drawing graphs, this results in ‘Alabama First’ graphs,
and it’s usually better to sort the elements of a graph by more meaningful principles than alphabetical order.
This post illustrates three convenience functions you can use to sort factor levels in R
according to another covariate, their frequency of occurrence, or manually.
First you’ll need the dplyr and magrittr packages:
You can download the convenience functions from my Github page or read them in directly into R:
Sorting factor levels by another variable
The code below creates an example dataset with a factor and a covariate:
What we want is to sort the levels of the factor by the covariate mean per factor level (i.e., a-b-e-c-d).
The function sortLvlsByVar.fnc accomplishes this:
By setting the ascending parameter to FALSE, the factor levels are sorting descendingly according to the covariate mean:
How this looks like when graphed:
You can change the R code from the Github page so that the levels are sorted by another summary statistics, e.g., the covariate median per factor level.
Sorting factor levels by their frequency of occurrence
Again we’ll first create some data:
We want to order these factor levels by their frequency of occurrence in the dataset (i.e., b-a-d-c-e).
sortLvlsByN.fnc() accomplishes this:
Customising the order of factor levels
If you want to put the factor levels in a custom order, you can use the sortLvls.fnc() function.
Let’s say we, for some reason, want to put the current 5th level (e) first, the current 3rd level (c) second, the 4th 3rd, the 4th 2nd and the 1st last:
You can also just specify which factor levels need to go up front; the order of the other ones stays the same: