我有一组数据格式为:
ID Minutes Value
xxxx 118 3
xxxx 121 4
xxxx 122 3
yyyy 122 6
xxxx 123 4
yyyy 123 8
... ... ....
每个ID都是患者,每个值都是,例如,该分钟的血压。我想在每个点之前60分钟和每个点之后60分钟创建一个滚动平均值。但是 - 如您所见,缺少分钟(因此我不能仅使用行号)并且我想为每个唯一ID创建平均值(因此ID xxxx的平均值不能包括分配给ID yyyy的值)。听起来像rollapply或者rollingstat可能是一种选择,但是试图把它拼凑在一起却没什么成功......
如果需要进一步说明,请告诉我。
答案 0 :(得分:11)
您可以轻松填写缺失的分钟数(值将设置为NA),然后使用rollapply
library(data.table)
library(zoo)
## Convert to data.table
DT <- data.table(DF, key=c("IDs", "Minutes"))
## Missing Minutes will be added in. Value will be set to NA.
DT <- DT[CJ(unique(IDs), seq(min(Minutes), max(Minutes)))]
## Run your function
DT[, rollapply(value, 60, mean, na.rm=TRUE), by=IDs]
您可以一次性完成所有操作:
## Convert your DF to a data.able
DT <- data.table(DF, key=c("IDs", "Minutes"))
## Compute rolling means, with on-the-fly padded minutes
DT[ CJ(unique(IDs), seq(min(Minutes), max(Minutes))) ][,
rollapply(value, 60, mean, na.rm=TRUE), by=IDs]
答案 1 :(得分:4)
使用tidyr/dplyr
代替data.table
和RcppRoll
代替zoo
的替代方法:
library(dplyr)
library(tidyr)
library(RcppRoll)
d %>%
group_by(ID) %>%
# add rows for unosberved minutes
complete(Minutes = full_seq(Minutes, 1)) %>%
# RcppRoll::roll_mean() is written in C++ for speed
mutate(moving_mean = roll_mean(Value, 131, fill = NA, na.rm = TRUE)) %>%
# keep only the rows that were originally observed
filter(!is.na(Value))
数据强>
d <- data_frame(
ID = rep(1:3, each = 5),
Minutes = rep(c(1, 30, 60, 120, 200), 3),
Value = rpois(15, lambda = 10)
)