计算R

时间:2018-04-03 07:17:48

标签: r time dplyr zoo median

我需要根据实际值创建一个新列,其中每个ID的过去6个月(180天)的中值。如果没有信息或以前的记录是> 6个月,中值必须是该行的值。

输入数据

我有这个:

structure(list(id = c(1, 2, 3, 3, 3, 4, 4, 4, 4, 4, 4), value = c(956, 
986, 995, 995, 986, 700, 600, 995, 956, 1000, 986), date = structure(c(15601, 
17075, 10965, 11068, 11243, 14610, 15248, 15342, 15344, 15380, 
16079), class = "Date")), .Names = c("id", "value", "date"), row.names = c(NA, -11L), class = "data.frame")

我必须实现的目标是:

structure(list(id = c(1, 2, 3, 3, 3, 4, 4, 4, 4, 4, 4), value = c(956, 
986, 995, 995, 986, 700, 600, 995, 956, 1000, 986), date = structure(c(15601, 
17075, 10965, 11068, 11243, 14610, 15248, 15342, 15344, 15380, 
16079), class = "Date"), median = c(956,986,995,995,990,700,600,797.5,956,975.5, 986)), .Names = c("id", "value", "date", "median"), row.names = c(NA, -11L), class = "data.frame")

我尝试过使用来自动物园包的rollaplyr和rollmeadian来跟踪这篇文章中提供的答案 Finding Cumulative Sum In R Using Conditions

但我无法取得好成绩。

先谢谢你

1 个答案:

答案 0 :(得分:1)

试试这个解决方案:

使用data.frame函数将id分割为split

list_df<-split(df,f=df$id)

使用id条件在单个date值上提供中位数的函数:

f_median<-function(i,db)
{
  return(median(db[as.POSIXct(db[,"date"])>=as.POSIXct(db[i,"date"]-180) & as.POSIXct(db[,"date"])<=as.POSIXct(db[i,"date"]),"value"]))
}

迭代分割data.frame:

f<-function(db)
{
   return(sapply(rep(1:nrow(db)),f_median,db))
}

您想要的输出

 median<-unlist(lapply(list_df,f))
 cbind(df,median)
   id value       date median
1   1   956 2012-09-18  956.0
2   2   986 2016-10-01  986.0
31  3   995 2000-01-09  995.0
32  3   995 2000-04-21  995.0
33  3   986 2000-10-13  990.5
41  4   700 2010-01-01  700.0
42  4   600 2011-10-01  600.0
43  4   995 2012-01-03  797.5
44  4   956 2012-01-05  956.0
45  4  1000 2012-02-10  975.5
46  4   986 2014-01-09  986.0