我有多个时间序列对象,其间隔为五分钟,但它们可以有不同的开始和结束时间。他们也可以在不同时间登录,不一定是在分钟5,10,15等。
我想合并这些对象,但我希望保持合法的NA完好无损。例如,一个对象稍后开始记录,然后开头的NA是合法的NA。如果一个对象先前停止记录,那么最后的NA也是合法的。
但是没有选择用na.locf来保持两个NA完整。
这是我的问题的一个例子:
lines1="Index,x1
2014-01-01 00:00:00,73.06
2014-01-01 00:05:00,73.11
2014-01-01 00:10:00,73.16
2014-01-01 00:15:00,73.22"
lines2="Index,x2
2014-01-01 00:11:00,71.11
2014-01-01 00:16:00,70.12
2014-01-01 00:21:00,70.16
2014-01-01 00:26:00,70.19
2014-01-01 00:31:00,69.16"
lines3="Index,x3
2014-01-01 00:23:00,0
2014-01-01 00:28:00,1
2014-01-01 00:33:00,1
2014-01-01 00:38:00,0
2014-01-01 00:43:00,0"
df1=read.table(text = lines1, header = TRUE, sep = ",")
df2=read.table(text = lines2, header = TRUE, sep = ",")
df3=read.table(text = lines3, header = TRUE, sep = ",")
z1 = zoo(df1$x1, as.POSIXct(df1$Index))
z2 = zoo(df2$x2, as.POSIXct(df2$Index))
z3 = zoo(df3$x3, as.POSIXct(df3$Index))
z = merge(z1,z2,z3)
z
z.na.locf = na.locf(z)
z.na.locf
timesteps = seq(as.POSIXct("2014-01-01 00:00:00"),
as.POSIXct("2014-01-01 01:00:00"),
by = "5 min")
z.timesteps = na.locf(z, xout=timesteps)
z.timesteps
合并的对象是:
> z
z1 z2 z3
2014-01-01 00:00:00 73.06 NA NA
2014-01-01 00:05:00 73.11 NA NA
2014-01-01 00:10:00 73.16 NA NA
2014-01-01 00:11:00 NA 71.11 NA
2014-01-01 00:15:00 73.22 NA NA
2014-01-01 00:16:00 NA 70.12 NA
2014-01-01 00:21:00 NA 70.16 NA
2014-01-01 00:23:00 NA NA 0
2014-01-01 00:26:00 NA 70.19 NA
2014-01-01 00:28:00 NA NA 1
2014-01-01 00:31:00 NA 69.16 NA
2014-01-01 00:33:00 NA NA 1
2014-01-01 00:38:00 NA NA 0
2014-01-01 00:43:00 NA NA 0
请注意,z1开头的NA是合法的,也是在z3的末尾,以及z2的开头和结尾。需要替换的NA是数据中间的NA。问题是如果我试图填写数据中间的缺失值,合法的NA也会消失:
> z.na.locf
z1 z2 z3
2014-01-01 00:00:00 73.06 NA NA
2014-01-01 00:05:00 73.11 NA NA
2014-01-01 00:10:00 73.16 NA NA
2014-01-01 00:11:00 73.16 71.11 NA
2014-01-01 00:15:00 73.22 71.11 NA
2014-01-01 00:16:00 73.22 70.12 NA
2014-01-01 00:21:00 73.22 70.16 NA
2014-01-01 00:23:00 73.22 70.16 0
2014-01-01 00:26:00 73.22 70.19 0
2014-01-01 00:28:00 73.22 70.19 1
2014-01-01 00:31:00 73.22 69.16 1
2014-01-01 00:33:00 73.22 69.16 1
2014-01-01 00:38:00 73.22 69.16 0
2014-01-01 00:43:00 73.22 69.16 0
请注意,对于z1和z2,最终合法的NA已经消失。
此外,如果我想重新采样数据以具有相同的常规时间戳,那么开头和结尾的NA也都消失了。
> z.timesteps
z1 z2 z3
2014-01-01 00:00:00 73.06 71.11 0
2014-01-01 00:05:00 73.11 71.11 0
2014-01-01 00:10:00 73.16 71.11 0
2014-01-01 00:15:00 73.22 71.11 0
2014-01-01 00:20:00 73.22 70.12 0
2014-01-01 00:25:00 73.22 70.16 0
2014-01-01 00:30:00 73.22 70.19 1
2014-01-01 00:35:00 73.22 69.16 1
2014-01-01 00:40:00 73.22 69.16 0
2014-01-01 00:45:00 73.22 69.16 0
2014-01-01 00:50:00 73.22 69.16 0
2014-01-01 00:55:00 73.22 69.16 0
2014-01-01 01:00:00 73.22 69.16 0
有没有办法可以实现我的需要?谢谢你的帮助。
答案 0 :(得分:1)
na.fill
可以在这里提供帮助。以下代码行将在开头和结尾处保留NAs的运行,但使用na.locf
填写剩余的NA:
zz <- na.locf(z, na.rm = FALSE) + 0 * na.fill(z, fill = c(NA, 0, NA))
,并提供:
> zz
z1 z2 z3
2014-01-01 00:00:00 73.06 NA NA
2014-01-01 00:05:00 73.11 NA NA
2014-01-01 00:10:00 73.16 NA NA
2014-01-01 00:11:00 73.16 71.11 NA
2014-01-01 00:15:00 73.22 71.11 NA
2014-01-01 00:16:00 NA 70.12 NA
2014-01-01 00:21:00 NA 70.16 NA
2014-01-01 00:23:00 NA 70.16 0
2014-01-01 00:26:00 NA 70.19 0
2014-01-01 00:28:00 NA 70.19 1
2014-01-01 00:31:00 NA 69.16 1
2014-01-01 00:33:00 NA NA 1
2014-01-01 00:38:00 NA NA 0
2014-01-01 00:43:00 NA NA 0
注1:我们可以将read.table
/ zoo
行缩减为表格的三行:
z1 <- read.zoo(text = lines1, header = TRUE, sep = ",", tz = "")
注2:也许您接下来要做的是:
timesteps <- seq(start(zz), start(zz) + 3600, by = "5 min")
m <- merge(zz, zoo(, timesteps))
m.na <- na.locf(m, na.rm = FALSE) + 0 * na.fill(m, fill = c(NA, 0, NA))
window(m.na, timesteps)