带有自定义间隔的切割日期向量

时间:2019-07-01 23:21:53

标签: r group-by dplyr

我想根据定义的breaks(0-7天,8-15天,...,31-50天)削减日期范围,然后计算值的组平均值。

library(dplyr)

date = seq(as.Date("2019/1/1"), by = "day", length.out = 50)
value = matrix(rnorm(200, 100, 50), nrow=50) %>% data.frame()
sample = cbind(date, value) %>% data.frame()

breaks = c(0, 7, 15, 30, 50)

sample %>%
  group_by(cutt = cut(StayDate, breaks=breaks)) %>%
  summarise(m1 = mean(X1), m2=mean(X2))

但是,似乎cut函数只能使用“ day”,“ week”等进行剪切。有什么办法可以做到吗?

3 个答案:

答案 0 :(得分:2)

我们可能会转换为"factor",然后又转换回"numeric"

library(dplyr)
sample %>%
  group_by(cutt=cut(as.numeric(factor(date)), breaks=breaks)) %>%
  summarise(m1=mean(X1), m2=mean(X2))
# # A tibble: 4 x 3
# cutt       m1    m2
# <fct>   <dbl> <dbl>
# 1 (0,7]   126.  120. 
# 2 (7,15]  123.   90.3
# 3 (15,30]  82.6 107. 
# 4 (30,50]  90.4 104. 

或在基数R中:

do.call(rbind, by(sample[2:3], cut(as.numeric(factor(sample$date)), breaks), colMeans))
#                X1        X2
# (0,7]   125.79941 120.01652
# (7,15]  122.82247  90.33681
# (15,30]  82.64698 107.13250
# (30,50]  90.39701 104.09779

数据

set.seed(42)
n <- 50
sample <- data.frame(date=seq(as.Date("2019/1/1"), by="day", length.out=n),
                  matrix(rnorm(4*n, 100, 50), ncol=4, 
                         dimnames=list(NULL, paste0("X", 1:4))))
breaks <- c(0, 7, 15, 30, 50)

答案 1 :(得分:2)

由于您要根据天数划分date,因此可以将每个date减去first date。使用@ jay.sf的数据

library(dplyr)

sample %>%
  mutate(new_date = as.integer(date - first(date)) + 1L) %>%
  group_by(cutt = cut(new_date, breaks = breaks)) %>%
  summarise_at(vars(X1, X2), mean)

# A tibble: 4 x 3
#  cutt     X1    X2
#  <fct>   <dbl> <dbl>
#1 (0,7]   126.  120. 
#2 (7,15]  123.   90.3
#3 (15,30]  82.6 107. 
#4 (30,50]  90.4 104. 

在您的示例中,您有连续的date,但是如果日期之间存在差异,则可以将此代码考虑在内,但我不确定是否打算这样做。

答案 2 :(得分:0)

我们可以使用data.table方法

library(data.table)
setDT(df1)[,lapply(.SD, mean) , .(cutt = cut(as.numeric(factor(date)), 
           breaks = breaks)), .SDcols = X1:X2]
#     cutt        X1        X2
#1:   (0,7] 125.79941 120.01652
#2:  (7,15] 122.82247  90.33681
#3: (15,30]  82.64698 107.13250
#4: (30,50]  90.39701 104.09779