使用zoo包在R中的时间序列数据中填充缺少的日期和时间

时间:2018-03-24 03:35:39

标签: r time-series zoo

我有一个四分之一小时(间隔15分钟)的频率数据。

sasan<-read.csv("sasanhz.csv", header = TRUE)

head(sasan)
               Timestamp Avg.Hz
1 12/27/2017 12:15:00 AM  50.05
2 12/27/2017 12:30:00 AM  49.99
3 12/27/2017 12:45:00 AM  49.98
4 12/27/2017 01:00:00 AM  50.01
5 12/27/2017 01:15:00 AM  49.97
6 12/27/2017 01:30:00 AM  49.98

str(sasan)
'data.frame':   5501 obs. of  2 variables:
 $ Timestamp: Factor w/ 5501 levels "01/01/2018 00:00:00 AM",..: 5112 5114 5116 5023 5025 
                                 5027 5029 5031 5033 5035 ...
 $ Avg.Hz   : num  50 50 50 50 50 ...

 #change to posixct

sasan$Timestamp<-as.POSIXct(sasan$Timestamp, format="%m/%d/%Y %I:%M:%S %p")

在这个时间序列中,我在coloum“Timestamp”中有一些丢失的数据时间我想要归咎于缺少日期时间。 我试过了zoo

    z<-zoo(sasan)
    > head(z[1489:1497])
     Timestamp           Avg.Hz
1489 2018-01-11 12:15:00 50.02 
1490 2018-01-11 12:30:00 49.99 
1491 2018-01-11 12:45:00 49.94 
1492 <NA>                49.98 
1493 <NA>                50.02 
1494 <NA>                49.95

zoo包中使用“na.locf”函数输入日期和时间的NA值时,我收到了以下错误。

 sasan_mis<-seq(start(z), end(z), by = times("00:15:00"))
> na.locf(z, xout = sasan_mis)
Error in approx(x[!na], y[!na], xout, ...) : zero non-NA points
In addition: Warning message:
In xy.coords(x, y, setLab = FALSE) : NAs introduced by coercion

如何克服此错误?我该如何归咎于这个丢失的日期时间?感谢你的建议。

dput(head(z))
structure(c("2017-12-27 00:15:00", "2017-12-27 00:30:00", "2017-12-27 00:45:00", 
"2017-12-27 01:00:00", "2017-12-27 01:15:00", "2017-12-27 01:30:00", 
"50.05", "49.99", "49.98", "50.01", "49.97", "49.98"), .Dim = c(6L, 
2L), .Dimnames = list(NULL, c("Timestamp", "Avg.Hz")), index = 1:6, class = "zoo")

我使用的库包是

library(ggplot2)
library(forecast)
library(tseries)
library(xts)
library(zoo)
library(dplyr)

1 个答案:

答案 0 :(得分:1)

假设OP在数据中缺少Timestamp个变量值并寻找填充它的方法。

来自na.approx包的

zoo在这种情况下非常方便。

# na.approx from zoo to populate missing values of Timestamp
sasan$Timestamp <- as.POSIXct(na.approx(sasan$Timestamp), origin = "1970-1-1")
sasan
# 1  2017-12-27 00:15:00  50.05
# 2  2017-12-27 00:30:00  49.99
# 3  2017-12-27 00:45:00  49.98
# 4  2017-12-27 01:00:00  50.01
# 5  2017-12-27 01:15:00  49.97
# 6  2017-12-27 01:30:00  49.98
# 7  2017-12-27 01:45:00  49.98
# 8  2017-12-27 02:00:00  50.02
# 9  2017-12-27 02:15:00  49.95
# 10 2017-12-27 02:30:00  49.98

数据

# OP's data has been slightly modified to include NAs
sasan <- read.table(text = 
"Timestamp           Avg.Hz
1 '12/27/2017 12:15:00 AM'  50.05
2 '12/27/2017 12:30:00 AM'  49.99
3 '12/27/2017 12:45:00 AM'  49.98
4 '12/27/2017 01:00:00 AM'  50.01
5 '12/27/2017 01:15:00 AM'  49.97
6 '12/27/2017 01:30:00 AM'  49.98
7 <NA>                      49.98 
8 <NA>                      50.02 
9 <NA>                      49.95
10 '12/27/2017 02:30:00 AM'  49.98", 
header = TRUE, stringsAsFactors = FALSE)

# convert to POSIXct 
sasan$Timestamp<-as.POSIXct(sasan$Timestamp, format="%m/%d/%Y %I:%M:%S %p")