我想根据定义的breaks
(0-7天,8-15天,...,31-50天)削减日期范围,然后计算值的组平均值。
library(dplyr)
date = seq(as.Date("2019/1/1"), by = "day", length.out = 50)
value = matrix(rnorm(200, 100, 50), nrow=50) %>% data.frame()
sample = cbind(date, value) %>% data.frame()
breaks = c(0, 7, 15, 30, 50)
sample %>%
group_by(cutt = cut(StayDate, breaks=breaks)) %>%
summarise(m1 = mean(X1), m2=mean(X2))
但是,似乎cut
函数只能使用“ day”,“ week”等进行剪切。有什么办法可以做到吗?
答案 0 :(得分:2)
我们可能会转换为"factor"
,然后又转换回"numeric"
。
library(dplyr)
sample %>%
group_by(cutt=cut(as.numeric(factor(date)), breaks=breaks)) %>%
summarise(m1=mean(X1), m2=mean(X2))
# # A tibble: 4 x 3
# cutt m1 m2
# <fct> <dbl> <dbl>
# 1 (0,7] 126. 120.
# 2 (7,15] 123. 90.3
# 3 (15,30] 82.6 107.
# 4 (30,50] 90.4 104.
或在基数R中:
do.call(rbind, by(sample[2:3], cut(as.numeric(factor(sample$date)), breaks), colMeans))
# X1 X2
# (0,7] 125.79941 120.01652
# (7,15] 122.82247 90.33681
# (15,30] 82.64698 107.13250
# (30,50] 90.39701 104.09779
set.seed(42)
n <- 50
sample <- data.frame(date=seq(as.Date("2019/1/1"), by="day", length.out=n),
matrix(rnorm(4*n, 100, 50), ncol=4,
dimnames=list(NULL, paste0("X", 1:4))))
breaks <- c(0, 7, 15, 30, 50)
答案 1 :(得分:2)
由于您要根据天数划分date
,因此可以将每个date
减去first
date
。使用@ jay.sf的数据
library(dplyr)
sample %>%
mutate(new_date = as.integer(date - first(date)) + 1L) %>%
group_by(cutt = cut(new_date, breaks = breaks)) %>%
summarise_at(vars(X1, X2), mean)
# A tibble: 4 x 3
# cutt X1 X2
# <fct> <dbl> <dbl>
#1 (0,7] 126. 120.
#2 (7,15] 123. 90.3
#3 (15,30] 82.6 107.
#4 (30,50] 90.4 104.
在您的示例中,您有连续的date
,但是如果日期之间存在差异,则可以将此代码考虑在内,但我不确定是否打算这样做。
答案 2 :(得分:0)
我们可以使用data.table
方法
library(data.table)
setDT(df1)[,lapply(.SD, mean) , .(cutt = cut(as.numeric(factor(date)),
breaks = breaks)), .SDcols = X1:X2]
# cutt X1 X2
#1: (0,7] 125.79941 120.01652
#2: (7,15] 122.82247 90.33681
#3: (15,30] 82.64698 107.13250
#4: (30,50] 90.39701 104.09779