如何计算不同开始日期的移动平均线?

时间:2017-07-05 16:42:13

标签: r time-series moving-average

我想计算数据集中每个参与者的移动平均线。

参与者可能有多个访问日期,我想计算每次访问前3天和过去2天的平均值(不包括访问日期)。

例如,设id = 1,日期= 6/6/2017。

过去2天的平均值应该是2017年6月5日和2017年4月4日的平均值。

生成样本数据集,如下所示。 我正在开发一个更大的数据集,参与者更多,访问次数更多,价值更多。我想找到一种有效的方法来计算这些平均值。

timeseries <- data.frame(id=c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3),                         date=c("6/1/2017","6/2/2017","6/3/2017","6/4/2017","6/5/2017","6/6/2017",
                            "6/1/2017","6/2/2017","6/3/2017","6/4/2017","6/5/2017","6/6/2017",
                            "6/1/2017","6/2/2017","6/3/2017","6/4/2017","6/5/2017","6/6/2017"),
                     value=c(2,3,4,NA,6,7,
                             NA,9,5,NA,3,2,
                             5,7,3,8,3,5))
> timeseries
   id     date value
1   1 6/1/2017     2
2   1 6/2/2017     3
3   1 6/3/2017     4
4   1 6/4/2017    NA
5   1 6/5/2017     6
6   1 6/6/2017     7
7   2 6/1/2017    NA
8   2 6/2/2017     9
9   2 6/3/2017     5
10  2 6/4/2017    NA
...

visit <- data.frame(id=c(1,1,2,3,3,3),
                date=c("6/6/2017","6/5/2017",
                       "6/6/2017",
                       "6/6/2017","6/5/2017","6/4/2017"))

> visit
  id     date
1  1 6/6/2017
2  1 6/5/2017
3  2 6/6/2017
4  3 6/6/2017
5  3 6/5/2017
6  3 6/4/2017

结果表应该是这样的,其中mean3是过去3天的平均值,mean2是过去2天的平均值

> result
  id     date mean3 mean2
1  1 6/6/2017            
2  1 6/5/2017            
3  2 6/6/2017            
4  3 6/6/2017            
5  3 6/5/2017            
6  3 6/4/2017     

1 个答案:

答案 0 :(得分:0)

对于visit的每个timeseries,我将来自mean的相应数据进行子集,然后计算valuen_days的{​​{1}}。< / p>

library(lubridate)
n_days = 2
sapply(1:NROW(visit), function(i)
    with(subset(x = timeseries,
                subset = timeseries$id == visit$id[i]),
         mean(x = value[difftime(time1 = mdy(visit$date[i]),
                                 time2 = mdy(date),
                                 units = "days") <= n_days &
                            difftime(time1 = mdy(visit$date[i]),
                                time2 = mdy(date),
                                units = "days") > 0],
              na.rm = TRUE)))
#[1] 6.0 4.0 3.0 5.5 5.5 5.0