我有一个包含时间数据的数据库,但是对于某些时间戳,没有可用的数据(数据库中的NA)。我想对这些值进行插值。
数据集:
structure(list(timestamp = structure(1:7, .Label = c("21/01/2012 18:41",
+ "21/01/2012 18:46", "21/01/2012 18:51", "21/01/2012 18:56", "21/01/2012 19:01",
+ "21/01/2012 19:06", "21/01/2012 19:11"), class = "factor"), humid = c(47.7,
+ 44.5, NA, 42.5, 42.5, NA, 41.6), temp = c(14.12, 15.37, NA, 16.17,
+ 16.31, NA, 16.51)), .Names = c("timestamp", "humid", "temp"), class = "data.frame", row.names = c(NA,
+ -7L))
看起来像这样:
timestamp humid temp
1 21/01/2012 18:41 47.700000000000003 14.119999999999999
2 21/01/2012 18:46 44.500000000000000 15.369999999999999
3 21/01/2012 18:51 NA NA
4 21/01/2012 18:56 42.500000000000000 16.170000000000002
5 21/01/2012 19:01 42.500000000000000 16.309999999999999
6 21/01/2012 19:06 NA NA
7 21/01/2012 19:11 41.600000000000001 16.510000000000002
我已经尝试过选项A:
library(zoo)
Mz <- zoo(TEST)
index(Mz) <- Mz[,1]
Mz_approx <- na.approx(Mz, x=Mz$timestamp)
但这导致以下错误:
Error in approx(x[!na], y[!na], xout, ...) :
need at least two non-NA values to interpolate
In addition: Warning messages:
1: In na.approx.default(object, x = x, xout = xout, na.rm = FALSE, :
NAs introduced by coercion
2: In na.approx.default(object, x = x, xout = xout, na.rm = FALSE, :
NAs introduced by coercion
3: In xy.coords(x, y) : NAs introduced by coercion
我也尝试了选项B:
library(zoo)
Mz <- zoo(TEST)
Mz_approx <- na.approx(Mz)
但是这会导致以下错误:
Error in approx(x[!na], y[!na], xout, ...) :
need at least two non-NA values to interpolate
In addition: Warning message:
In xy.coords(x, y) : NAs introduced by coercion
克服这些错误并正确使用na.approx功能的最佳方法是什么?
答案 0 :(得分:1)
read.zoo
会将其转换为正确处理索引的动物园,然后可以使用na.approx
。动物园附带了几个小插曲(pdf手册),包括一本专门用于read.zoo
示例的完整手册,动物园帮助文件中有很多例子可供您使用。
library(zoo)
z <- read.zoo(TEST, tz = "", format = "%d/%m/%Y %H:%M")
na.approx(z)
,并提供:
humid temp
2012-01-21 18:41:00 47.70 14.12
2012-01-21 18:46:00 44.50 15.37
2012-01-21 18:51:00 43.50 15.77
2012-01-21 18:56:00 42.50 16.17
2012-01-21 19:01:00 42.50 16.31
2012-01-21 19:06:00 42.05 16.41
2012-01-21 19:11:00 41.60 16.51