Home Artificial Intelligence Tabyl: A Frequency Table for the Modern R User

Tabyl: A Frequency Table for the Modern R User

0
Tabyl: A Frequency Table for the Modern R User

Out with the old, in with the brand new!

Towards Data Science
Image created using Canva Image Generator

Anyone who has worked with categorical data eventually got here across a have to calculate absolutely the number and proportion of a certain class. This text introduces the tabyl function for creating frequency tables through a series of hands-on examples.

What does tabyl bring to the table (no pun intended :D)?

The tabyl function is a feature of the janitor package in R. It’s a really convenient tool for creating contingency tables, otherwise generally known as frequency tables or cross-tabulations. Listed here are among the advantages of using tabyl:

1. Easy syntax: tabyl has an easy-to-use syntax. It may take one, two, or three variables, and it mechanically returns an information frame that features counts and proportions.

2. Flexibility: tabyl can generate one-way (single variable), two-way (two variables), and three-way (three variables) contingency tables. This flexibility makes it suitable for a big selection of applications.

3. Automatic calculation of proportions: tabyl mechanically calculates the proportions (percentages) for one-way contingency tables. For 2 and three-way tables, the identical result might be completed together with the adorn_percentages function from the identical package.

4. Compatibility with dplyr: The output of tabyl is an information frame (or tibble), which makes it fully compatible with dply functions and the tidyverse ecosystem. This implies you’ll be able to easily pipe %>% the output into further data wrangling or visualization functions.

5. Neat and informative output: tabyl provides neat and informative output, which incorporates the variable names as row names and column names, making it easier to interpret the outcomes.

For all these reasons, tabyl is an ideal selection when you should create frequency tables in R. It simplifies many steps and integrates well with the tidyverse approach to data evaluation.

The dataset

Photo by Hans Veth on Unsplash

This post will reveal the advantages of the tabyl function from the janitor package using the info on the edibility of several types of mushrooms depending on their odor. Here, I might be using a tidied dataset under the name mushrooms, but you’ll be able to access the unique data on Kaggle. Below is the code used for cleansing the info.

library(tidyverse)
library(janitor)

mushrooms <- read_csv("mushrooms.csv") %>%
select(class, odor) %>%
mutate(
class = case_when(
class == "p" ~ "poisonous",
class == "e" ~ "edible"
),
odor = case_when(
odor == "a" ~ "almond",
odor == "l" ~ "anise",
odor == "c" ~ "creosote",
odor == "y" ~ "fishy",
odor == "f" ~ "foul",
odor == "m" ~ "musty",
odor == "n" ~ "none",
odor == "p" ~ "pungent",
odor == "s" ~ "spicy"
)
)

Should you are unfamiliar with the above syntax, please try a hands-on guide to using the tidyverse in certainly one of my earlier articles.

The old

So as to higher understand which benefits tabyl offers, let’s first make a frequency table using the bottom R table function.

table(mushrooms$class)

edible poisonous
4208 3916

table(mushrooms$odor, mushrooms$class)

edible poisonous
almond 400 0
anise 400 0
creosote 0 192
fishy 0 576
foul 0 2160
musty 0 36
none 3408 120
pungent 0 256
spicy 0 576

Unsurprisingly, it seems that odor is an ideal predictor of mushroom edibility, with anything “funny-smelling” probably being poisonous. Thanks evolution! Also, there appear to be many more poisonous mushrooms, so it’s all the time vital to be cautious when picking mushrooms on your individual.

If we wish to give you the option to make use of the variable names directly without specifying the $ operator, we would want to make use of the with command to make the dataset available to the table function.

mush_table <- with(mushrooms, table(odor, class))

Unfortunately, if we wish to upgrade to proportions as an alternative of absolute numbers, we are able to not use the identical function but one other one as an alternative — prop.table .

prop.table(mush_table)

class
odor edible poisonous
almond 0.049236829 0.000000000
anise 0.049236829 0.000000000
creosote 0.000000000 0.023633678
fishy 0.000000000 0.070901034
foul 0.000000000 0.265878877
musty 0.000000000 0.004431315
none 0.419497784 0.014771049
pungent 0.000000000 0.031511571
spicy 0.000000000 0.070901034

By default, this provides us a column-wise proportion table. If we wish row-wise proportions, we are able to specify the margin argument (1 for row-wise and a pair of for column-wise).

prop.table(mush_table, margin = 1)

class
odor edible poisonous
almond 1.00000000 0.00000000
anise 1.00000000 0.00000000
creosote 0.00000000 1.00000000
fishy 0.00000000 1.00000000
foul 0.00000000 1.00000000
musty 0.00000000 1.00000000
none 0.96598639 0.03401361
pungent 0.00000000 1.00000000
spicy 0.00000000 1.00000000

All these special functions can feel cumbersome and hard to recollect, so a single function which accommodates all of the above funcionality could be nice to have.

Moreover, if we check the sort of the created object using the class(mush_table) command, we see that it’s of a category table.

This creates a compatibility problem, since nowadays R users are mostly using the tidyverse ecosystem which is centered around applying functions to data.frame type objects and stringing the outcomes together using the pipe (%>%) operator.

The brand new

Let’s do the identical things with the tabyl function.

tabyl(mushrooms, class)

class n percent
edible 4208 0.5179714
poisonous 3916 0.4820286

mush_tabyl <- tabyl(mushrooms, odor, class)
mush_tabyl

odor edible poisonous
almond 400 0
anise 400 0
creosote 0 192
fishy 0 576
foul 0 2160
musty 0 36
none 3408 120
pungent 0 256
spicy 0 576

In comparison with the corresponding table output, the resulting tables aretidier using the tabyl function, with variable names (class) being explicitly stated. Furthermore, for the one-way table, apart from numbers, the odds are mechanically generated as well.

We also can notice that we didn’t must use the which functio to give you the option to specify the variable names directly. Moreover, running class(mush_tabyl) tells us that the resulting object is of a data.frame class which ensures tidyverse compatibility!

The adorned janitor

Image created using Canva Image Generator

For extra tabyl functionalities, the janitor package also accommodates a series of adorn functions. To get the odds, we simply pipe the resulting frequency table to the adorn_percentages function.

mush_tabyl %>% adorn_percentages()

odor edible poisonous
almond 1.0000000 0.00000000
anise 1.0000000 0.00000000
creosote 0.0000000 1.00000000
fishy 0.0000000 1.00000000
foul 0.0000000 1.00000000
musty 0.0000000 1.00000000
none 0.9659864 0.03401361
pungent 0.0000000 1.00000000
spicy 0.0000000 1.00000000

If we wish the column-wise percentages, we are able to specify the denominator argument as “col”.

mush_tabyl %>% adorn_percentages(denominator = "col")

odor edible poisonous
almond 0.09505703 0.000000000
anise 0.09505703 0.000000000
creosote 0.00000000 0.049029622
fishy 0.00000000 0.147088866
foul 0.00000000 0.551583248
musty 0.00000000 0.009193054
none 0.80988593 0.030643514
pungent 0.00000000 0.065372829
spicy 0.00000000 0.147088866

The tabyladorn combo even enables us to simply mix each the number and percentage in a same table cell…

mush_tabyl %>% adorn_percentages %>% adorn_ns

odor edible poisonous
almond 1.0000000 (400) 0.00000000 (0)
anise 1.0000000 (400) 0.00000000 (0)
creosote 0.0000000 (0) 1.00000000 (192)
fishy 0.0000000 (0) 1.00000000 (576)
foul 0.0000000 (0) 1.00000000 (2160)
musty 0.0000000 (0) 1.00000000 (36)
none 0.9659864 (3408) 0.03401361 (120)
pungent 0.0000000 (0) 1.00000000 (256)
spicy 0.0000000 (0) 1.00000000 (576)

… or add the totals to the rows and columns.

mush_tabyl %>% adorn_totals(c("row", "col"))

odor edible poisonous Total
almond 400 0 400
anise 400 0 400
creosote 0 192 192
fishy 0 576 576
foul 0 2160 2160
musty 0 36 36
none 3408 120 3528
pungent 0 256 256
spicy 0 576 576
Total 4208 3916 8124

Conclusion

The tabyl() function from the janitor package in R offers a user-friendly and versatile solution for creating one-way, two-way, or three-way contingency tables. It excels in mechanically computing proportions and producing tidy data frames that integrate seamlessly with the tidyverse ecosystem, especially dplyr. Its outputs are well-structured and straightforward to interpret, and it will possibly be further enhanced with adorn functions, simplifying the general strategy of generating informative frequency tables. This makes tabyl() a highly helpful tool in data evaluation in R.

LEAVE A REPLY

Please enter your comment!
Please enter your name here