Question

我经常需要创建一个带有计数，百分比和边际总数的格式化表格。例如，我可能有三个类和两个类别的数据。我想为每个类创建一个包含行的表，包含每个类别的类内计数和百分比以及该类别的总计数。最后，底部的总行总计类别，显示类别占总数的百分比，以及总计。

代码总是丑陋的:-)我希望找到更好的方法。这是一个非常简单的例子;通常它比这复杂得多。

=====  ===  =======  ===  ======  =====
Class  Yes  Yes pct   No  No pct  Total
=====  ===  =======  ===  ======  =====
one     35      65%   19     35%     54
two     21      70%    9     30%     30
three    9      56%    7     44%     16
Total   65      65%   35     35%    100
=====  ===  =======  ===  ======  =====

我知道addmargins（对data.frame不起作用），prop.table（给出一个单独的比例表），descr::CrossTable（将值放在单元格中，而不是分布在这一行）。欢迎任何关于如何清理它的建议。

以下是创建上表的代码：

library(formattable) # For nice percents
library(tidyverse)

# Make up some data. Three classes with two categories within each class
# Order of cls is important so it is a factor
d = tibble(cls=sample(c('one', 'two', 'three'), 100, 
                      replace=TRUE, prob=c(0.5, 0.3, 0.2)),
           conf=sample(c('yes', 'no'), 100, 
                       replace=TRUE, prob=c(0.6, 0.4))) %>% 
  mutate(cls = factor(cls, levels=c('one', 'two', 'three', 'Total')))

# Tabulate by cls and conf
d2 = d %>% group_by(cls, conf) %>% 
  summarise(n=n()) %>% # Total per cls x conf
  spread(conf, n)  # Spread to one row per cls

# Add a total row. Do this before calculating percents so we don't total
# the within-row percents. This is really ugly
d2 = d2 %>% bind_rows(as_data_frame(t(c(cls=NA, colSums(d2[,-1])))))
d2$cls[nrow(d2)] = 'Total'

d2 = d2 %>% mutate(total=no+yes, # Make percents and row totals
         no_pct=percent(no/total, 0),
         yes_pct=percent(yes/total, 0)) %>% 
  select(Class=cls, Yes=yes, `Yes pct`=yes_pct, # Reorder and rename columns
         No=no, `No pct`=no_pct, Total=total)

formattable(d2) # Yay! Nice table.
knitr::kable(d2, format='rst') # For pasting above

Answer 1

下面的代码仍然有些参与（也许可以进一步简化），但对我来说似乎更直观并且利用了tidyverse功能。我已经包含了一些注释来解释代码在每个阶段的作用。

# Tabulate by cls and conf
d2 = d %>% group_by(cls, conf) %>%             
  tally %>% 
  # Add a row with column totals (group only by conf, instead of by cls and conf)
  bind_rows(d %>% group_by(conf) %>%           
      tally %>%
      mutate(cls="Total")) %>%
  # Add percent column by taking advantage of long format and pre-existing grouping
  mutate(pct = round(n/sum(n)*100)) %>%
  # Now spread to wide format       
  gather(key, value, n, pct, -cls, -conf) %>%  
  unite(conf_key, conf, key) %>%
  spread(conf_key, value) %>%
  # Add percent symbols
  mutate_at(vars(matches("pct")), funs(paste0(.,"%"))) %>%
  # Get cls values in the right order and add row totals
  ungroup %>%
  mutate(cls = factor(cls, levels=c("one","two","three","Total")),
         Total = no_n + yes_n) %>%
  arrange(cls) %>%
  select(Class=cls, Yes=yes_n, `Yes pct`=yes_pct, No=no_n, `No pct`=no_pct, Total)

   Class   Yes `Yes pct`    No `No pct` Total
1    one    32       73%    12      27%    44
2    two    21       52%    19      48%    40
3  three    10       62%     6      38%    16
4  Total    63       63%    37      37%   100

如何格式化具有计数，百分比和边际总数的表

1 个答案: