我正在尝试为每个组ID滚动90天的窗口计算几何平均值和90%。它可以在data.table中工作,但不能在dplyr中工作,我比较满意。 DT_new是输出的样子。另外,补间时间,zoo-rollapply和tbrf均不成功,因此需要手动滚动。
1)如何在dplyr中做到这一点?
2)在data.table中,如何修改代码,以便在一条语句中同时添加新的几何均值和分位数列?
3)在data.table中,如何为每个90天的窗口添加一个名为“组”的列?
geo_mean <- function(data) {
log_data <- log(data)
gm <- exp(mean(log_data[is.finite(log_data)]))
return(gm)
}
###sample data#
Value=c(50,900,25,25,125,50,25,25,2000,25,25,25,25,25,25,25,25,325,25,300,475,25)
Date = as.Date(c("2015-02-23","2015-04-20","2015-06-17",
"2015-08-20","2015-10-05","2015-12-22",
"2016-01-19","2016-03-29","2016-05-03",
"2016-07-21","2016-09-08","2016-11-07",
"2017-02-27","2017-04-19","2017-06-29",
"2017-08-24","2017-10-23","2017-12-28",
"2018-01-16","2018-03-14","2018-05-29",
"2018-07-24"))
ID = c(rep("A", 11), rep("B", 11))
df=data.frame(Value,Date,ID)
require(data.table)
DT=setDT(df) #####Works with data.table but I have to run it twice#######
DTnew= DT[, Rollquantile:= {
d <- DT$Date- Date
quantile(DT$Value[ID == DT$ID & d <= 0 & d >= -90],0.90)
},by = list(Date,ID)]
DT_new= DTnew[, Rollgeomean:= {
d <- DT$Date- Date
geo_mean(DT$Value[ID == DT$ID & d <= 0 & d >= -90])
},by = list(Date,ID)]
############ won't work in dplyr########
df_new=df %>% group_by(ID,Date) %>% mutate(d = df$Date- Date,
Geomean=geo_mean(df$Value[ID == df$ID & d <= 0 & d >= -90]),
Quantile= quantile(df$Value[ID == df$ID & d <= 0 & d >= -90],0.90))
##failed attempt to add "Group"########
paste0("Group_", 1 + c(0, cumsum((c(TRUE, lag(Date.Time) > 90)))))