我有一组与我们的主要结果指标(ID
)和协变量(Y
)相对应的个人(X1
)的日期和时间。
如果X1
测量值是在距离日期/时间的+/- 24小时内记录的,那么我的目标是替换每个Y
行的缺失X1
值测量了Y
变量。为了使这更容易可视化(并加载到R),以下是当前数据的排列方式:
structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L), TIME = structure(1:15, .Label = c("01/01/2013 12:01",
"01/03/2013 08:49", "01/03/2013 20:52", "02/01/2013 05:00", "02/03/2013 05:30",
"02/03/2013 21:14", "02/05/2013 05:15", "02/12/2013 05:03", "02/15/2013 04:16",
"02/16/2013 04:12", "02/16/2013 21:02", "03/01/2010 17:58", "03/02/2010 00:10",
"03/03/2010 10:45", "03/04/2010 09:00"), class = "factor"), Y = structure(c(1L,
5L, 7L, 1L, 1L, 2L, 1L, 1L, 1L, 4L, 3L, 1L, 8L, 1L, 6L), .Label = c(".",
"22", "35", "4", "5", "6", "8", "9"), class = "factor"), X1 = structure(c(2L,
1L, 1L, 7L, 7L, 1L, 4L, 4L, 3L, 1L, 1L, 6L, 1L, 5L, 1L), .Label = c(".",
"0.1", "0.2", "0.4", "0.6", "0.9", "1.0"), class = "factor")), .Names = c("ID",
"TIME", "Y", "X1"), class = "data.frame", row.names = c(NA, -15L))
为了简化所需的输出,我想只显示具有非缺失Y
值的行,这样最终产品将如下所示:
ID TIME Y X1
1 1 01/03/2013 08:49 5 .
2 1 01/03/2013 20:52 8 .
3 2 02/03/2013 21:14 22 .
4 2 02/16/2013 04:12 4 0.2
5 2 02/16/2013 21:02 35 .
6 3 03/02/2010 00:10 9 0.9
7 3 03/04/2010 09:00 6 0.6
是否有可能(1)迭代多行并评估24小时的绝对值以获得X1
和Y
测量值之间的差异,以及(2)替换缺失值X1
与那些在+/- 24小时窗口内的人?{/ p>
对于如何解决这个问题的任何想法都将非常感激!
答案 0 :(得分:0)
如果您将数据转换为xts
,那么您可以使用xts的简易子集功能来获得您想要的内容。
PS:如果在Y测量的24小时内恰好有1个X1值,则以下代码将起作用。
require(xts)
xx <- xts(DF[, c(1, 4, 5)], as.POSIXct(paste0(DF$Date, " ", DF$TIME), format = "%m/%d/%Y %H:%M"))
sapply(index(xx[!is.na(xx$Y)]), FUN = function(tt) {
startTime <- tt - 24 * 60 * 60
endTime <- tt + 24 * 60 * 60
y <- xx[paste(startTime, endTime, sep = "/")]
if (nrow(y[!is.na(y$X1), "X1"]) != 0) {
return(as.vector(y[!is.na(y$X1), "X1"]))
} else {
return(NA)
}
})
## [1] 0.9 0.6 NA NA 1.0 0.2 NA
xx[!is.na(xx$Y), "X1"] <- sapply(index(xx[!is.na(xx$Y)]), FUN = function(tt) {
startTime <- tt - 24 * 60 * 60
endTime <- tt + 24 * 60 * 60
y <- xx[paste(startTime, endTime, sep = "/")]
if (nrow(y[!is.na(y$X1), "X1"]) != 0) {
return(as.vector(y[!is.na(y$X1), "X1"]))
} else {
return(NA)
}
})
xx[!is.na(xx$Y), "X1"]
## X1
## 2010-03-02 00:10:00 0.9
## 2010-03-04 09:00:00 0.6
## 2013-01-03 08:49:00 NA
## 2013-01-03 20:52:00 NA
## 2013-02-03 21:14:00 1.0
## 2013-02-16 04:12:00 0.2
## 2013-02-16 21:02:00 NA