我有许多时间序列列表,每个时间序列都有一些缺失值。 这是一个简短的例子:
x <- list(structure(c(NA, NA, 30, 1260, 504, 24, 132, 60, 766.8, 643.68,
54.96, 0, 9.48, 186.36, NA, NA, NA, NA, 723.24, 426.36, 198.96,
528.72, 29.04, 132, 60, 348, 5.04, 12, 144, 0), index = structure(c(189385200,
189471600, 189558000, 189644400, 189730800, 189817200, 189903600,
189990000, 190076400, 190162800, 190249200, 190335600, 190422000,
190508400, 190594800, 190681200, 190767600, 190854000, 190940400,
191026800, 191113200, 191199600, 191286000, 191372400, 191458800,
191545200, 191631600, 191718000, 191804400, 191890800), class = c("POSIXct",
"POSIXt")), class = "zoo"), structure(c(NA, NA, 144.96, 33.96,
10.08, 20.64, 12, NA, NA, 13.1904, 21.8784, 19.836, 30.8208,
96.3312, 57.3288, 30.0672, 25.9872, NA, NA, NA, NA, 56.3472,
79.4064, 35.64, 25.92, 44.88, 4.872, 78), index = structure(c(189385200,
189471600, 189558000, 189644400, 189730800, 189817200, 189903600,
189990000, 190076400, 190162800, 190249200, 190335600, 190422000,
190508400, 190594800, 190681200, 190767600, 190854000, 190940400,
191026800, 191113200, 191199600, 191286000, 191372400, 191458800,
191545200, 191631600, 191718000), class = c("POSIXct", "POSIXt"
)), class = "zoo"), structure(c(25.8876260869565, 33.931, 12.50435,
19.721225, 17.5955, 10.296775, 6.862425, 5.321225, 10.0137, 14.7752,
11.35255, 7.0339, 5.2703, 4.672575, 3.777625, 3.26115, 2.97095,
NA, NA, NA, NA, NA, NA, 5.469975, 4.29925), index = structure(c(189385200,
189471600, 189558000, 189644400, 189730800, 189817200, 189903600,
189990000, 190076400, 190162800, 190249200, 190335600, 190422000,
190508400, 190594800, 190681200, 190767600, 190854000, 190940400,
191026800, 191113200, 191199600, 191286000, 191372400, 191458800
), class = c("POSIXct", "POSIXt")), class = "zoo"))
我需要查找没有任何时间序列包含缺失值的句点的开始和结束。对于上面的例子,我希望得到类似的东西:
START END
1976-01-03 23:00:00 1976-01-07 23:00:00
1976-01-10 23:00:00 1976-01-14 23:00:00
1976-01-24 23:00:00 1976-01-25 23:00:00
如果上一个(下一个)值为NA,我可以编写一个循环,在每个时间步查找非NA值,然后将时间戳写入数据帧的START(END)列。
我想知道是否已经存在任何现有功能(可能比正常循环更快)?
答案 0 :(得分:0)
您可以使用以下逻辑采用矢量方法:
如果值为NA
,则它不能是起点或终点。
如果值不是NA
,则当且仅当:
NA
,或如果值不是NA
,则当且仅当:
NA
,或者所以逻辑看起来像:
start_end_points <- function(x){
x_is_na <- is.na(x)
prev_is_na_or_first <- c(TRUE, x_is_na[1:length(x)-1])
next_is_na_or_last <- c(x_is_na[2:length(x)], TRUE)
x_is_start_point <- !x_is_na & prev_is_na_or_first
x_is_end_point <- !x_is_na & next_is_na_or_last
data.frame(start_point=attributes(x)$index[x_is_start_point],
end_point=attributes(x)$index[x_is_end_point])
}
lapply(x,start_end_points)
返回:
[[1]]
start_point end_point
1 1976-01-03 18:00:00 1976-01-14 18:00:00
2 1976-01-19 18:00:00 1976-01-30 18:00:00
[[2]]
start_point end_point
1 1976-01-03 18:00:00 1976-01-07 18:00:00
2 1976-01-10 18:00:00 1976-01-17 18:00:00
3 1976-01-22 18:00:00 1976-01-28 18:00:00
[[3]]
start_point end_point
1 1976-01-01 18:00:00 1976-01-17 18:00:00
2 1976-01-24 18:00:00 1976-01-25 18:00:00
(由于我认为系统时区设置,时间显示不同。)