R:在数据框

时间:2016-10-28 20:39:39

标签: r dataframe aggregate

我有一个数据框“calls1”,我想知道如何创建一个新变量“PercCallsMo”,它是来自“CallsHandled”变量的总调用百分比,每个调用队列“QUEUE”代表给定月份“MON1_12。”我的示例数据文件如下:

structure(list(MON1_12 = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 
2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), QUEUE = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L), .Label = c("APPLICATION_STATUS", "BENEFITS", "BILLING"
), class = "factor"), CallsHandled = c(9L, 3L, 10L, 27L, 64L, 
17L, 10L, 58L, 8L, 29L, 32L, 12L, 2L, 6L, 1L, 3L, 2L, 2L, 2L, 
2L)), .Names = c("MON1_12", "QUEUE", "CallsHandled"), class = "data.frame", row.names = c(NA, 
-20L))

我期待的结果会在每个月“MON1_12”的连续行上显示每个“QUEUE”所代表的“PercCallsMo”,并且应该如下所示:

structure(list(MON1_12 = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 
2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), QUEUE = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L), .Label = c("APPLICATION_STATUS", "BENEFITS", "BILLING"
), class = "factor"), CallsHandled = c(9L, 3L, 10L, 27L, 64L, 
17L, 10L, 58L, 8L, 29L, 32L, 12L, 2L, 6L, 1L, 3L, 2L, 2L, 2L, 
2L), PercCallsMo = c(0.362962963, 0.362962963, 0.362962963, 0.362962963, 
0.554878049, 0.554878049, 0.554878049, 0.488888889, 0.488888889, 
0.37195122, 0.37195122, 0.148148148, 0.148148148, 0.148148148, 
0.073170732, 0.073170732, 0.073170732, 0.073170732, 0.073170732, 
0.073170732)), .Names = c("MON1_12", "QUEUE", "CallsHandled", 
"PercCallsMo"), class = "data.frame", row.names = c(NA, -20L))

2 个答案:

答案 0 :(得分:2)

你可以这样做:

library(dplyr)

calls1 = calls1 %>%
  group_by(MON1_12) %>%
  mutate(month_total = sum(CallsHandled)) %>%
  group_by(MON1_12, QUEUE) %>%
  mutate(PercCallsMo = sum(CallsHandled)/month_total) %>%
  select(-month_total)

答案 1 :(得分:0)

使用基础R

range unbounded preceding