我需要根据实际值创建一个新列,其中每个ID的过去6个月(180天)的中值。如果没有信息或以前的记录是> 6个月,中值必须是该行的值。
输入数据
我有这个:
structure(list(id = c(1, 2, 3, 3, 3, 4, 4, 4, 4, 4, 4), value = c(956,
986, 995, 995, 986, 700, 600, 995, 956, 1000, 986), date = structure(c(15601,
17075, 10965, 11068, 11243, 14610, 15248, 15342, 15344, 15380,
16079), class = "Date")), .Names = c("id", "value", "date"), row.names = c(NA, -11L), class = "data.frame")
我必须实现的目标是:
structure(list(id = c(1, 2, 3, 3, 3, 4, 4, 4, 4, 4, 4), value = c(956,
986, 995, 995, 986, 700, 600, 995, 956, 1000, 986), date = structure(c(15601,
17075, 10965, 11068, 11243, 14610, 15248, 15342, 15344, 15380,
16079), class = "Date"), median = c(956,986,995,995,990,700,600,797.5,956,975.5, 986)), .Names = c("id", "value", "date", "median"), row.names = c(NA, -11L), class = "data.frame")
我尝试过使用来自动物园包的rollaplyr和rollmeadian来跟踪这篇文章中提供的答案 Finding Cumulative Sum In R Using Conditions
但我无法取得好成绩。
先谢谢你
答案 0 :(得分:1)
试试这个解决方案:
使用data.frame
函数将id
分割为split
:
list_df<-split(df,f=df$id)
使用id
条件在单个date
值上提供中位数的函数:
f_median<-function(i,db)
{
return(median(db[as.POSIXct(db[,"date"])>=as.POSIXct(db[i,"date"]-180) & as.POSIXct(db[,"date"])<=as.POSIXct(db[i,"date"]),"value"]))
}
迭代分割data.frame:
f<-function(db)
{
return(sapply(rep(1:nrow(db)),f_median,db))
}
您想要的输出
median<-unlist(lapply(list_df,f))
cbind(df,median)
id value date median
1 1 956 2012-09-18 956.0
2 2 986 2016-10-01 986.0
31 3 995 2000-01-09 995.0
32 3 995 2000-04-21 995.0
33 3 986 2000-10-13 990.5
41 4 700 2010-01-01 700.0
42 4 600 2011-10-01 600.0
43 4 995 2012-01-03 797.5
44 4 956 2012-01-05 956.0
45 4 1000 2012-02-10 975.5
46 4 986 2014-01-09 986.0