我经常需要创建一个带有计数,百分比和边际总数的格式化表格。例如,我可能有三个类和两个类别的数据。我想为每个类创建一个包含行的表,包含每个类别的类内计数和百分比以及该类别的总计数。最后,底部的总行总计类别,显示类别占总数的百分比,以及总计。
代码总是丑陋的:-)我希望找到更好的方法。这是一个非常简单的例子;通常它比这复杂得多。
===== === ======= === ====== =====
Class Yes Yes pct No No pct Total
===== === ======= === ====== =====
one 35 65% 19 35% 54
two 21 70% 9 30% 30
three 9 56% 7 44% 16
Total 65 65% 35 35% 100
===== === ======= === ====== =====
我知道addmargins
(对data.frame不起作用),prop.table
(给出一个单独的比例表),descr::CrossTable
(将值放在单元格中,而不是分布在这一行)。欢迎任何关于如何清理它的建议。
以下是创建上表的代码:
library(formattable) # For nice percents
library(tidyverse)
# Make up some data. Three classes with two categories within each class
# Order of cls is important so it is a factor
d = tibble(cls=sample(c('one', 'two', 'three'), 100,
replace=TRUE, prob=c(0.5, 0.3, 0.2)),
conf=sample(c('yes', 'no'), 100,
replace=TRUE, prob=c(0.6, 0.4))) %>%
mutate(cls = factor(cls, levels=c('one', 'two', 'three', 'Total')))
# Tabulate by cls and conf
d2 = d %>% group_by(cls, conf) %>%
summarise(n=n()) %>% # Total per cls x conf
spread(conf, n) # Spread to one row per cls
# Add a total row. Do this before calculating percents so we don't total
# the within-row percents. This is really ugly
d2 = d2 %>% bind_rows(as_data_frame(t(c(cls=NA, colSums(d2[,-1])))))
d2$cls[nrow(d2)] = 'Total'
d2 = d2 %>% mutate(total=no+yes, # Make percents and row totals
no_pct=percent(no/total, 0),
yes_pct=percent(yes/total, 0)) %>%
select(Class=cls, Yes=yes, `Yes pct`=yes_pct, # Reorder and rename columns
No=no, `No pct`=no_pct, Total=total)
formattable(d2) # Yay! Nice table.
knitr::kable(d2, format='rst') # For pasting above
答案 0 :(得分:3)
下面的代码仍然有些参与(也许可以进一步简化),但对我来说似乎更直观并且利用了tidyverse
功能。我已经包含了一些注释来解释代码在每个阶段的作用。
# Tabulate by cls and conf
d2 = d %>% group_by(cls, conf) %>%
tally %>%
# Add a row with column totals (group only by conf, instead of by cls and conf)
bind_rows(d %>% group_by(conf) %>%
tally %>%
mutate(cls="Total")) %>%
# Add percent column by taking advantage of long format and pre-existing grouping
mutate(pct = round(n/sum(n)*100)) %>%
# Now spread to wide format
gather(key, value, n, pct, -cls, -conf) %>%
unite(conf_key, conf, key) %>%
spread(conf_key, value) %>%
# Add percent symbols
mutate_at(vars(matches("pct")), funs(paste0(.,"%"))) %>%
# Get cls values in the right order and add row totals
ungroup %>%
mutate(cls = factor(cls, levels=c("one","two","three","Total")),
Total = no_n + yes_n) %>%
arrange(cls) %>%
select(Class=cls, Yes=yes_n, `Yes pct`=yes_pct, No=no_n, `No pct`=no_pct, Total)
Class Yes `Yes pct` No `No pct` Total 1 one 32 73% 12 27% 44 2 two 21 52% 19 48% 40 3 three 10 62% 6 38% 16 4 Total 63 63% 37 37% 100