我想计算数据集中每个参与者的移动平均线。
参与者可能有多个访问日期,我想计算每次访问前3天和过去2天的平均值(不包括访问日期)。
例如,设id = 1,日期= 6/6/2017。
过去2天的平均值应该是2017年6月5日和2017年4月4日的平均值。
生成样本数据集,如下所示。 我正在开发一个更大的数据集,参与者更多,访问次数更多,价值更多。我想找到一种有效的方法来计算这些平均值。
timeseries <- data.frame(id=c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3), date=c("6/1/2017","6/2/2017","6/3/2017","6/4/2017","6/5/2017","6/6/2017",
"6/1/2017","6/2/2017","6/3/2017","6/4/2017","6/5/2017","6/6/2017",
"6/1/2017","6/2/2017","6/3/2017","6/4/2017","6/5/2017","6/6/2017"),
value=c(2,3,4,NA,6,7,
NA,9,5,NA,3,2,
5,7,3,8,3,5))
> timeseries
id date value
1 1 6/1/2017 2
2 1 6/2/2017 3
3 1 6/3/2017 4
4 1 6/4/2017 NA
5 1 6/5/2017 6
6 1 6/6/2017 7
7 2 6/1/2017 NA
8 2 6/2/2017 9
9 2 6/3/2017 5
10 2 6/4/2017 NA
...
visit <- data.frame(id=c(1,1,2,3,3,3),
date=c("6/6/2017","6/5/2017",
"6/6/2017",
"6/6/2017","6/5/2017","6/4/2017"))
> visit
id date
1 1 6/6/2017
2 1 6/5/2017
3 2 6/6/2017
4 3 6/6/2017
5 3 6/5/2017
6 3 6/4/2017
结果表应该是这样的,其中mean3是过去3天的平均值,mean2是过去2天的平均值
> result
id date mean3 mean2
1 1 6/6/2017
2 1 6/5/2017
3 2 6/6/2017
4 3 6/6/2017
5 3 6/5/2017
6 3 6/4/2017
答案 0 :(得分:0)
对于visit
的每个timeseries
,我将来自mean
的相应数据进行子集,然后计算value
内n_days
的{{1}}。< / p>
library(lubridate)
n_days = 2
sapply(1:NROW(visit), function(i)
with(subset(x = timeseries,
subset = timeseries$id == visit$id[i]),
mean(x = value[difftime(time1 = mdy(visit$date[i]),
time2 = mdy(date),
units = "days") <= n_days &
difftime(time1 = mdy(visit$date[i]),
time2 = mdy(date),
units = "days") > 0],
na.rm = TRUE)))
#[1] 6.0 4.0 3.0 5.5 5.5 5.0