我想在大型data.frames中填写一些缺少的日期。我看到了不同的帖子,但没有任何工作。我使用合并,我认为这很容易,但结果并不是我所期望的。
我的数据包含全年的每小时数据,以及变量的相应值。我只展示了一个样本:
# sample of data
dput(head(x1))
structure(list(date = structure(c(14617, 14617, 14617, 14617,
14617, 14617), class = "Date"), value = c(-9999, -9999, -9999,
-9999, -9999, -9999)), .Names = c("date", "value"), row.names =
c(2923L, 6545L, 10167L, 13789L, 17411L, 21033L), class = "data.frame")
因此,由于我想添加缺少的数据,因此我使用正确完整的时间序列创建了数组:
# Create hourly data
times <- seq(as.POSIXct("2010-01-01 00:00:00"), as.POSIXct("2010-12-31 23:00:00"), by="hour")
# Split into days and hours
nt <- as.Date(strptime(times, "%Y-%m-%d"))
ndays <- data.frame("date"=nt,"hour"=format(as.POSIXct(strptime(times,"%Y-%m-%d %H:%M:%S",tz="")) ,format = "%H:%M:%S"))
我尝试合并ndays
和x1
,以获得包含整个日期(和小时)的新data.frame
:
newdata <- merge(ndays,x1,by="date",all.x = T)
但是,我没有x1
的价值!我有NA
,所以我尝试合并了不同的合并选项,但没有一个合适。如果我使用:
newdata <- merge(x1, ndays,by="date",all.x = T)
结果如下:
head(newdata)
date value hour
1 2010-01-08 -9999 12:00:00
2 2010-01-08 -9999 01:00:00
3 2010-01-08 -9999 02:00:00
4 2010-01-08 -9999 03:00:00
5 2010-01-08 -9999 00:00:00
6 2010-01-08 -9999 05:00:00
.....
但我想要的是:
head(newdata)
date value hour
2010-01-01 NA 00:00:00
........
2010-01-08 -9999 12:00:00
2010-01-08 -9999 01:00:00
2010-01-08 -9999 02:00:00
要拥有整个日期,并且最终预期的data.frame必须有一个长度(对于每列)8760 =每年的小时数(timestep)。 如果我这样做:
newdata <- merge(ndays,x1,by="date",all = T)
同样,我有一个新的data.frame,长度为193680,因为所有数据都已合并。但我只想要x1的值以及全年的日和小时。
合并时我缺少什么?我应该写另一个功能吗?
答案 0 :(得分:0)
如果我理解正确,我相信这可以通过在连接中更新来解决。这是一种特殊的左连接,即只占nday
的所有行,并仅在找到匹配的value
的行中复制date
:
library(data.table)
setDT(ndays)[unique(setDT(x1)), on = "date", value := value]
请注意,假设每天只有一个不同的值,则只使用x1
的唯一行。
# show some relevant rows
ndays[date %in% (as.IDate("2010-01-08") + (-1:+1))]
date hour value 1: 2010-01-07 00:00:00 NA 2: 2010-01-07 01:00:00 NA 3: 2010-01-07 02:00:00 NA 4: 2010-01-07 03:00:00 NA 5: 2010-01-07 04:00:00 NA 6: 2010-01-07 05:00:00 NA 7: 2010-01-07 06:00:00 NA 8: 2010-01-07 07:00:00 NA 9: 2010-01-07 08:00:00 NA 10: 2010-01-07 09:00:00 NA 11: 2010-01-07 10:00:00 NA 12: 2010-01-07 11:00:00 NA 13: 2010-01-07 12:00:00 NA 14: 2010-01-07 13:00:00 NA 15: 2010-01-07 14:00:00 NA 16: 2010-01-07 15:00:00 NA 17: 2010-01-07 16:00:00 NA 18: 2010-01-07 17:00:00 NA 19: 2010-01-07 18:00:00 NA 20: 2010-01-07 19:00:00 NA 21: 2010-01-07 20:00:00 NA 22: 2010-01-07 21:00:00 NA 23: 2010-01-07 22:00:00 NA 24: 2010-01-07 23:00:00 NA 25: 2010-01-08 00:00:00 -9999 26: 2010-01-08 01:00:00 -9999 27: 2010-01-08 02:00:00 -9999 28: 2010-01-08 03:00:00 -9999 29: 2010-01-08 04:00:00 -9999 30: 2010-01-08 05:00:00 -9999 31: 2010-01-08 06:00:00 -9999 32: 2010-01-08 07:00:00 -9999 33: 2010-01-08 08:00:00 -9999 34: 2010-01-08 09:00:00 -9999 35: 2010-01-08 10:00:00 -9999 36: 2010-01-08 11:00:00 -9999 37: 2010-01-08 12:00:00 -9999 38: 2010-01-08 13:00:00 -9999 39: 2010-01-08 14:00:00 -9999 40: 2010-01-08 15:00:00 -9999 41: 2010-01-08 16:00:00 -9999 42: 2010-01-08 17:00:00 -9999 43: 2010-01-08 18:00:00 -9999 44: 2010-01-08 19:00:00 -9999 45: 2010-01-08 20:00:00 -9999 46: 2010-01-08 21:00:00 -9999 47: 2010-01-08 22:00:00 -9999 48: 2010-01-08 23:00:00 -9999 49: 2010-01-09 00:00:00 NA 50: 2010-01-09 01:00:00 NA 51: 2010-01-09 02:00:00 NA 52: 2010-01-09 03:00:00 NA 53: 2010-01-09 04:00:00 NA 54: 2010-01-09 05:00:00 NA 55: 2010-01-09 06:00:00 NA 56: 2010-01-09 07:00:00 NA 57: 2010-01-09 08:00:00 NA 58: 2010-01-09 09:00:00 NA 59: 2010-01-09 10:00:00 NA 60: 2010-01-09 11:00:00 NA 61: 2010-01-09 12:00:00 NA 62: 2010-01-09 13:00:00 NA 63: 2010-01-09 14:00:00 NA 64: 2010-01-09 15:00:00 NA 65: 2010-01-09 16:00:00 NA 66: 2010-01-09 17:00:00 NA 67: 2010-01-09 18:00:00 NA 68: 2010-01-09 19:00:00 NA 69: 2010-01-09 20:00:00 NA 70: 2010-01-09 21:00:00 NA 71: 2010-01-09 22:00:00 NA 72: 2010-01-09 23:00:00 NA date hour value