我正在尝试根据“2016-04-10”和“2016-04-24”按3个日期范围对数据框进行分组。
df <- structure(list(date = structure(c(16803, 16810, 16817, 16824,
16831, 16838, 16845, 16852, 16859, 16866, 16873, 16880, 16887,
16894, 16901, 16908, 16915, 16922, 16929, 16936, 16943), class = "Date"),
new = c(1507L, 2851L, 3550L, 5329L, 7557L, 5546L, 6264L,
7160L, 9468L, 5789L, 5928L, 4642L, 8145L, 4867L, 4846L, 5231L,
7137L, 3938L, 3741L, 2937L, 194L), resolved = c(21, 27, 15,
16, 56, 2773, 8490, 8748, 9325, 7734, 10264, 6739, 6110,
9613, 10314, 10349, 7200, 9637, 10831, 11170, 5666), ost = c(1486,
2824, 3535, 5313, 7501, 2773, -2226, -1588, 143, -1945, -4336,
-2097, 2035, -4746, -5468, -5118, -63, -5699, -7090, -8233,
-5472)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-21L), .Names = c("date", "new", "resolved", "ost"))
尝试以下方法:
df1 <- df %>% group_by(dr=cut(date,breaks=as.Date(c("2016-04-10","2016-04-24")))) %>%
summarise(ost = sum(ost))
如下所示给出了错误的结果。
dr ost
2016-04-10 -10586
NA -17885
帮助表示赞赏!
答案 0 :(得分:6)
您可以先创建分组变量,
df %>%
mutate(group = cumsum(grepl('2016-04-10|2016-04-24', date))) %>%
group_by(group) %>%
summarise(ost = sum(ost))
#Source: local data frame [3 x 2]
# group ost
# (int) (dbl)
#1 0 8672
#2 1 -10586
#3 2 -26557
答案 1 :(得分:4)
我们创建了一个分组变量&#39; dr&#39;与cut
。提到的breaks
是&#39; date&#39;的range
。即日期&#39;的min
和max
值。连同OP指定的日期,连接它(c
),使用选项include.lowest
并得到sum
&#39; ost&#39;基于这个分组变量。
df %>%
group_by(dr = cut(date, breaks = c(range(date),
as.Date(c("2016-04-10", "2016-04-24"))), include.lowest=TRUE) ) %>%
summarise(ost =sum(ost))
# dr ost
# <fctr> <dbl>
#1 2016-01-03 8672
#2 2016-04-10 -10586
#3 2016-04-24 -26557
或其他选项findInterval
与cut
相比可能更快。
df %>%
group_by(dr = findInterval(date, as.Date(c("2016-04-10", "2016-04-24")))) %>%
summarise(ost = sum(ost))
# dr ost
# <int> <dbl>
#1 0 8672
#2 1 -10586
#3 2 -26557
注意:OP询问了有关cut
的问题,此解决方案给出了这一点。