R:如何在合并的zoo对象中保留合法的NA

时间:2015-01-24 03:45:51

标签: r merge zoo

我有多个时间序列对象,其间隔为五分钟,但它们可以有不同的开始和结束时间。他们也可以在不同时间登录,不一定是在分钟5,10,15等。

我想合并这些对象,但我希望保持合法的NA完好无损。例如,一个对象稍后开始记录,然后开头的NA是合法的NA。如果一个对象先前停止记录,那么最后的NA也是合法的。

但是没有选择用na.locf来保持两个NA完整。

这是我的问题的一个例子:

lines1="Index,x1
2014-01-01 00:00:00,73.06
2014-01-01 00:05:00,73.11
2014-01-01 00:10:00,73.16
2014-01-01 00:15:00,73.22"

lines2="Index,x2
2014-01-01 00:11:00,71.11
2014-01-01 00:16:00,70.12
2014-01-01 00:21:00,70.16
2014-01-01 00:26:00,70.19
2014-01-01 00:31:00,69.16"

lines3="Index,x3
2014-01-01 00:23:00,0
2014-01-01 00:28:00,1
2014-01-01 00:33:00,1
2014-01-01 00:38:00,0
2014-01-01 00:43:00,0"

df1=read.table(text = lines1, header = TRUE, sep = ",")
df2=read.table(text = lines2, header = TRUE, sep = ",")
df3=read.table(text = lines3, header = TRUE, sep = ",")

z1 = zoo(df1$x1, as.POSIXct(df1$Index))
z2 = zoo(df2$x2, as.POSIXct(df2$Index))
z3 = zoo(df3$x3, as.POSIXct(df3$Index))

z = merge(z1,z2,z3)
z

z.na.locf = na.locf(z)
z.na.locf

timesteps = seq(as.POSIXct("2014-01-01 00:00:00"), 
                as.POSIXct("2014-01-01 01:00:00"),
                by = "5 min")

z.timesteps = na.locf(z, xout=timesteps)
z.timesteps

合并的对象是:

> z
                       z1    z2 z3
2014-01-01 00:00:00 73.06    NA NA
2014-01-01 00:05:00 73.11    NA NA
2014-01-01 00:10:00 73.16    NA NA
2014-01-01 00:11:00    NA 71.11 NA
2014-01-01 00:15:00 73.22    NA NA
2014-01-01 00:16:00    NA 70.12 NA
2014-01-01 00:21:00    NA 70.16 NA
2014-01-01 00:23:00    NA    NA  0
2014-01-01 00:26:00    NA 70.19 NA
2014-01-01 00:28:00    NA    NA  1
2014-01-01 00:31:00    NA 69.16 NA
2014-01-01 00:33:00    NA    NA  1
2014-01-01 00:38:00    NA    NA  0
2014-01-01 00:43:00    NA    NA  0

请注意,z1开头的NA是合法的,也是在z3的末尾,以及z2的开头和结尾。需要替换的NA是数据中间的NA。问题是如果我试图填写数据中间的缺失值,合法的NA也会消失:

> z.na.locf
                       z1    z2 z3
2014-01-01 00:00:00 73.06    NA NA
2014-01-01 00:05:00 73.11    NA NA
2014-01-01 00:10:00 73.16    NA NA
2014-01-01 00:11:00 73.16 71.11 NA
2014-01-01 00:15:00 73.22 71.11 NA
2014-01-01 00:16:00 73.22 70.12 NA
2014-01-01 00:21:00 73.22 70.16 NA
2014-01-01 00:23:00 73.22 70.16  0
2014-01-01 00:26:00 73.22 70.19  0
2014-01-01 00:28:00 73.22 70.19  1
2014-01-01 00:31:00 73.22 69.16  1
2014-01-01 00:33:00 73.22 69.16  1
2014-01-01 00:38:00 73.22 69.16  0
2014-01-01 00:43:00 73.22 69.16  0

请注意,对于z1和z2,最终合法的NA已经消失。

此外,如果我想重新采样数据以具有相同的常规时间戳,那么开头和结尾的NA也都消失了。

> z.timesteps
                       z1    z2 z3
2014-01-01 00:00:00 73.06 71.11  0
2014-01-01 00:05:00 73.11 71.11  0
2014-01-01 00:10:00 73.16 71.11  0
2014-01-01 00:15:00 73.22 71.11  0
2014-01-01 00:20:00 73.22 70.12  0
2014-01-01 00:25:00 73.22 70.16  0
2014-01-01 00:30:00 73.22 70.19  1
2014-01-01 00:35:00 73.22 69.16  1
2014-01-01 00:40:00 73.22 69.16  0
2014-01-01 00:45:00 73.22 69.16  0
2014-01-01 00:50:00 73.22 69.16  0
2014-01-01 00:55:00 73.22 69.16  0
2014-01-01 01:00:00 73.22 69.16  0

有没有办法可以实现我的需要?谢谢你的帮助。

1 个答案:

答案 0 :(得分:1)

na.fill可以在这里提供帮助。以下代码行将在开头和结尾处保留NAs的运行,但使用na.locf填写剩余的NA:

zz <- na.locf(z, na.rm = FALSE) + 0 * na.fill(z, fill = c(NA, 0, NA))

,并提供:

> zz
                       z1    z2 z3
2014-01-01 00:00:00 73.06    NA NA
2014-01-01 00:05:00 73.11    NA NA
2014-01-01 00:10:00 73.16    NA NA
2014-01-01 00:11:00 73.16 71.11 NA
2014-01-01 00:15:00 73.22 71.11 NA
2014-01-01 00:16:00    NA 70.12 NA
2014-01-01 00:21:00    NA 70.16 NA
2014-01-01 00:23:00    NA 70.16  0
2014-01-01 00:26:00    NA 70.19  0
2014-01-01 00:28:00    NA 70.19  1
2014-01-01 00:31:00    NA 69.16  1
2014-01-01 00:33:00    NA    NA  1
2014-01-01 00:38:00    NA    NA  0
2014-01-01 00:43:00    NA    NA  0

注1:我们可以将read.table / zoo行缩减为表格的三行:

z1 <- read.zoo(text = lines1, header = TRUE, sep = ",", tz = "")

注2:也许您接下来要做的是:

timesteps <- seq(start(zz), start(zz) + 3600, by = "5 min")
m <- merge(zz, zoo(, timesteps))
m.na <- na.locf(m, na.rm = FALSE) + 0 * na.fill(m, fill = c(NA, 0, NA))
window(m.na, timesteps)