我希望在两个分类变量上聚合时间序列数据集时构建移动平均线。虽然我已经看过其他一些教程,但它们似乎都没有捕捉到我想要实现的具体任务。
我的原始数据集(df
)为每个人(id
)提供了一系列日期范围为0-180(Days
)的行。个人可以是两个数据子集之一(Group
)的成员。
然后我汇总这个数据框以获得两组的每日均值。
library(plyr)
summary <- ddply(df, .(Group,Days), summarise,
DV = mean(variable), resp=length(unique(Id)))
然而,下一步是在两个组内构建移动平均值。在下面的示例数据框中,我使用前5天构建了一个5天的平均值。
Group Days DV 5DayMA
exceeded 0 2859
exceeded 1 2948
exceeded 2 4412
exceeded 3 5074
exceeded 4 5098 4078
exceeded 5 5147 4536
exceeded 6 4459 4838
exceeded 7 4730 4902
exceeded 8 4643 4815
exceeded 9 4698 4735
exceeded 10 4818 4670
exceeded 11 4521 4682
othergroup 0 2859
othergroup 1 2948
othergroup 2 4412
othergroup 3 5074
othergroup 4 5098 4078
othergroup 5 5147 4536
othergroup 6 4459 4838
othergroup 7 4730 4902
othergroup 8 4643 4815
othergroup 9 4698 4735
othergroup 10 4818 4670
othergroup 11 4521 4682
关于如何做到这一点的任何想法?
答案 0 :(得分:9)
您可以尝试zoo::rollmean
df <- structure(list(Group = c("exceeded", "exceeded", "exceeded",
"exceeded", "exceeded", "exceeded", "exceeded", "exceeded", "exceeded",
"exceeded", "exceeded", "exceeded", "othergroup", "othergroup",
"othergroup", "othergroup", "othergroup", "othergroup", "othergroup",
"othergroup", "othergroup", "othergroup", "othergroup", "othergroup"
), Days = c(0L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L,
0L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), DV = c(2859L,
2948L, 4412L, 5074L, 5098L, 5147L, 4459L, 4730L, 4643L, 4698L,
4818L, 4521L, 2859L, 2948L, 4412L, 5074L, 5098L, 5147L, 4459L,
4730L, 4643L, 4698L, 4818L, 4521L), X5DayMA = c(NA, NA, NA, NA,
4078L, 4536L, 4838L, 4902L, 4815L, 4735L, 4670L, 4682L, NA, NA,
NA, NA, 4078L, 4536L, 4838L, 4902L, 4815L, 4735L, 4670L, 4682L
)), .Names = c("Group", "Days", "DV", "X5DayMA"), class = "data.frame", row.names = c(NA,
-24L))
head(df)
Group Days DV X5DayMA
1 exceeded 0 2859 NA
2 exceeded 1 2948 NA
3 exceeded 2 4412 NA
4 exceeded 3 5074 NA
5 exceeded 4 5098 4078
6 exceeded 5 5147 4536
library(plyr)
library(zoo)
ddply(
df, "Group",
transform,
5daymean = rollmean(DV, 5, align="right", na.pad=TRUE ))
Group Days DV X5DayMA 5daymean
1 exceeded 0 2859 NA NA
2 exceeded 1 2948 NA NA
3 exceeded 2 4412 NA NA
4 exceeded 3 5074 NA NA
5 exceeded 4 5098 4078 4078.2
6 exceeded 5 5147 4536 4535.8
7 exceeded 6 4459 4838 4838.0
8 exceeded 7 4730 4902 4901.6
9 exceeded 8 4643 4815 4815.4
10 exceeded 9 4698 4735 4735.4
11 exceeded 10 4818 4670 4669.6
12 exceeded 11 4521 4682 4682.0
13 othergroup 0 2859 NA NA
14 othergroup 1 2948 NA NA
15 othergroup 2 4412 NA NA
16 othergroup 3 5074 NA NA
17 othergroup 4 5098 4078 4078.2
18 othergroup 5 5147 4536 4535.8
19 othergroup 6 4459 4838 4838.0
20 othergroup 7 4730 4902 4901.6
21 othergroup 8 4643 4815 4815.4
22 othergroup 9 4698 4735 4735.4
23 othergroup 10 4818 4670 4669.6
24 othergroup 11 4521 4682 4682.0
使用dplyr 甚至更快
library(dplyr)
df %.%
dplyr:::group_by(Group) %.%
dplyr:::mutate('5daymean' = rollmean(DV, 5, align="right", na.pad=TRUE ))
或超快data.table
library(data.table)
dft <- data.table(df)
dft[ , `:=` ('5daymean' = rollmean(DV, 5, align="right", na.pad=TRUE )) , by=Group ]
答案 1 :(得分:3)
ave
和filter
:
with(df, ave(DV, Group, FUN=function(x) filter(x,rep(1/5,5),sides=1)))
# [1] NA NA NA NA 4078.2 4535.8 4838.0 4901.6 4815.4 4735.4
#[11] 4669.6 4682.0 NA NA NA NA 4078.2 4535.8 4838.0 4901.6
#[21] 4815.4 4735.4 4669.6 4682.0