data.table与dplyr

时间:2018-10-16 16:16:28

标签: r dplyr data.table

我正在尝试为每个组ID滚动90天的窗口计算几何平均值和90%。它可以在data.table中工作,但不能在dplyr中工作,我比较满意。 DT_new是输出的样子。另外,补间时间,zoo-rollapply和tbrf均不成功,因此需要手动滚动。

1)如何在dplyr中做到这一点?

2)在data.table中,如何修改代码,以便在一条语句中同时添加新的几何均值和分位数列?

3)在data.table中,如何为每个90天的窗口添加一个名为“组”的列?

 geo_mean <- function(data) {
 log_data <- log(data)
 gm <- exp(mean(log_data[is.finite(log_data)]))
 return(gm)
 }
  ###sample data#
 Value=c(50,900,25,25,125,50,25,25,2000,25,25,25,25,25,25,25,25,325,25,300,475,25)
Date = as.Date(c("2015-02-23","2015-04-20","2015-06-17",
    "2015-08-20","2015-10-05","2015-12-22",
    "2016-01-19","2016-03-29","2016-05-03",
    "2016-07-21","2016-09-08","2016-11-07",
    "2017-02-27","2017-04-19","2017-06-29",
    "2017-08-24","2017-10-23","2017-12-28",
    "2018-01-16","2018-03-14","2018-05-29",
    "2018-07-24"))
ID = c(rep("A", 11), rep("B", 11))
df=data.frame(Value,Date,ID) 
require(data.table) 
DT=setDT(df) #####Works with data.table but I have to run it twice#######
  DTnew= DT[, Rollquantile:= { 
d <- DT$Date- Date
quantile(DT$Value[ID == DT$ID & d <= 0 & d >= -90],0.90)
},by = list(Date,ID)]
DT_new= DTnew[, Rollgeomean:= { 
d <- DT$Date- Date
 geo_mean(DT$Value[ID == DT$ID & d <= 0 & d >= -90])
},by = list(Date,ID)]

############ won't work in dplyr########
df_new=df %>% group_by(ID,Date) %>% mutate(d = df$Date- Date, 
Geomean=geo_mean(df$Value[ID == df$ID & d <= 0 & d >= -90]),
Quantile= quantile(df$Value[ID == df$ID & d <= 0 & d >= -90],0.90))

##failed attempt to add "Group"########
paste0("Group_", 1 + c(0, cumsum((c(TRUE, lag(Date.Time) > 90)))))

0 个答案:

没有答案