让我们考虑以下示例:
set.seed(5)
df <- data.frame(CATEGORY = rep(c("A", "B", "C", "D"), each = 2),
SUBCATEGORY = paste0(rep(c("A", "B", "C", "D"), each = 2), 1:2),
COUNT = sample(1:1000, size = 8, replace = TRUE),
SUBCOUNT = sample(1:200, size = 8, replace = TRUE),
stringsAsFactors = FALSE)
df$SUBCOUNT_PCT <- paste0(formatC(df$SUBCOUNT/df$COUNT * 100, digits = 2, format = 'f'), "%")
> df
CATEGORY SUBCATEGORY COUNT SUBCOUNT SUBCOUNT_PCT
1 A A1 201 192 95.52%
2 A A2 686 23 3.35%
3 B B1 917 55 6.00%
4 B B2 285 99 34.74%
5 C C1 105 64 60.95%
6 C C2 702 112 15.95%
7 D D1 528 53 10.04%
8 D D2 808 41 5.07%
我想为CATEGORY
创建汇总COUNT
和SUBCOUNT
的行,如下所示:
CATEGORY SUBCATEGORY COUNT SUBCOUNT SUBCOUNT_PCT
1 A TOTAL 887 215 24.24%
2 A A1 201 192 95.52%
3 A A2 686 23 3.35%
4 B TOTAL 1202 154 12.81%
5 B B1 917 55 6.00%
6 B B2 285 99 34.74%
7 C TOTAL 807 176 21.81%
8 C C1 105 64 60.95%
9 C C2 702 112 10.04%
10 D TOTAL 1336 94 7.04%
11 D D1 528 53 10.04%
12 D D2 808 41 5.07%
有没有办法在不必遍历每个CATEGORY
的情况下执行此操作?
答案 0 :(得分:2)
使用dplyr
汇总数据,然后绑定回原始数据
library(dplyr)
df %>%
group_by(CATEGORY) %>%
summarize(SUBCATEGORY = "TOTAL",
COUNT = sum(COUNT),
SUBCOUNT = sum(SUBCOUNT),
SUBCOUNT_PCT = sprintf("%.2f%%", SUBCOUNT / COUNT * 100)) %>%
bind_rows(., df) %>%
arrange(CATEGORY)
# A tibble: 12 x 5
CATEGORY SUBCATEGORY COUNT SUBCOUNT SUBCOUNT_PCT
<chr> <chr> <int> <int> <chr>
1 A TOTAL 887 215 24.24%
2 A A1 201 192 95.52%
3 A A2 686 23 3.35%
4 B TOTAL 1202 154 12.81%
5 B B1 917 55 6.00%
6 B B2 285 99 34.74%
7 C TOTAL 807 176 21.81%
8 C C1 105 64 60.95%
9 C C2 702 112 15.95%
10 D TOTAL 1336 94 7.04%
11 D D1 528 53 10.04%
12 D D2 808 41 5.07%