Question

我有一个数据框“calls1”，我想知道如何创建一个新变量“PercCallsMo”，它是来自“CallsHandled”变量的总调用百分比，每个调用队列“QUEUE”代表给定月份“MON1_12。”我的示例数据文件如下：

structure(list(MON1_12 = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 
2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), QUEUE = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L), .Label = c("APPLICATION_STATUS", "BENEFITS", "BILLING"
), class = "factor"), CallsHandled = c(9L, 3L, 10L, 27L, 64L, 
17L, 10L, 58L, 8L, 29L, 32L, 12L, 2L, 6L, 1L, 3L, 2L, 2L, 2L, 
2L)), .Names = c("MON1_12", "QUEUE", "CallsHandled"), class = "data.frame", row.names = c(NA, 
-20L))

我期待的结果会在每个月“MON1_12”的连续行上显示每个“QUEUE”所代表的“PercCallsMo”，并且应该如下所示：

structure(list(MON1_12 = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 
2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), QUEUE = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L), .Label = c("APPLICATION_STATUS", "BENEFITS", "BILLING"
), class = "factor"), CallsHandled = c(9L, 3L, 10L, 27L, 64L, 
17L, 10L, 58L, 8L, 29L, 32L, 12L, 2L, 6L, 1L, 3L, 2L, 2L, 2L, 
2L), PercCallsMo = c(0.362962963, 0.362962963, 0.362962963, 0.362962963, 
0.554878049, 0.554878049, 0.554878049, 0.488888889, 0.488888889, 
0.37195122, 0.37195122, 0.148148148, 0.148148148, 0.148148148, 
0.073170732, 0.073170732, 0.073170732, 0.073170732, 0.073170732, 
0.073170732)), .Names = c("MON1_12", "QUEUE", "CallsHandled", 
"PercCallsMo"), class = "data.frame", row.names = c(NA, -20L))

Answer 1

你可以这样做：

library(dplyr)

calls1 = calls1 %>%
  group_by(MON1_12) %>%
  mutate(month_total = sum(CallsHandled)) %>%
  group_by(MON1_12, QUEUE) %>%
  mutate(PercCallsMo = sum(CallsHandled)/month_total) %>%
  select(-month_total)

Answer 2

使用基础R

range unbounded preceding

R：在数据框

2 个答案: