合并大型data.frame以填补缺少的每小时日期

时间:2018-01-26 12:03:29

标签: r join merge

我想在大型data.frames中填写一些缺少的日期。我看到了不同的帖子,但没有任何工作。我使用合并,我认为这很容易,但结果并不是我所期望的。

我的数据包含全年的每小时数据,以及变量的相应值。我只展示了一个样本:

# sample of data
dput(head(x1))
structure(list(date = structure(c(14617, 14617, 14617, 14617, 
  14617, 14617), class = "Date"), value = c(-9999, -9999, -9999, 
  -9999, -9999, -9999)), .Names = c("date", "value"), row.names = 
  c(2923L, 6545L, 10167L, 13789L, 17411L, 21033L), class = "data.frame")

因此,由于我想添加缺少的数据,因此我使用正确完整的时间序列创建了数组:

# Create hourly data

times <- seq(as.POSIXct("2010-01-01 00:00:00"), as.POSIXct("2010-12-31 23:00:00"), by="hour")
# Split into days and hours
nt <- as.Date(strptime(times, "%Y-%m-%d"))
ndays <- data.frame("date"=nt,"hour"=format(as.POSIXct(strptime(times,"%Y-%m-%d %H:%M:%S",tz="")) ,format = "%H:%M:%S"))

我尝试合并ndaysx1,以获得包含整个日期(和小时)的新data.frame

newdata <- merge(ndays,x1,by="date",all.x  = T)

但是,我没有x1的价值!我有NA,所以我尝试合并了不同的合并选项,但没有一个合适。如果我使用:

newdata <- merge(x1, ndays,by="date",all.x = T)

结果如下:

head(newdata)
  date       value hour
1 2010-01-08 -9999 12:00:00
2 2010-01-08 -9999 01:00:00
3 2010-01-08 -9999 02:00:00
4 2010-01-08 -9999 03:00:00
5 2010-01-08 -9999 00:00:00
6 2010-01-08 -9999 05:00:00
.....

但我想要的是:

head(newdata)
date       value   hour
2010-01-01 NA      00:00:00    
........
2010-01-08 -9999   12:00:00
2010-01-08 -9999   01:00:00
2010-01-08 -9999   02:00:00

要拥有整个日期,并且最终预期的data.frame必须有一个长度(对于每列)8760 =每年的小时数(timestep)。 如果我这样做:

newdata <- merge(ndays,x1,by="date",all = T)

同样,我有一个新的data.frame,长度为193680,因为所有数据都已合并。但我只想要x1的值以及全年的日和小时。

合并时我缺少什么?我应该写另一个功能吗?

1 个答案:

答案 0 :(得分:0)

如果我理解正确,我相信这可以通过在连接中更新来解决。这是一种特殊的左连接,即只占nday的所有行,并仅在找到匹配的value的行中复制date

library(data.table)
setDT(ndays)[unique(setDT(x1)), on = "date", value := value]

请注意,假设每天只有一个不同的值,则只使用x1的唯一行。

# show some relevant rows
ndays[date %in% (as.IDate("2010-01-08") + (-1:+1))]
          date     hour value
 1: 2010-01-07 00:00:00    NA
 2: 2010-01-07 01:00:00    NA
 3: 2010-01-07 02:00:00    NA
 4: 2010-01-07 03:00:00    NA
 5: 2010-01-07 04:00:00    NA
 6: 2010-01-07 05:00:00    NA
 7: 2010-01-07 06:00:00    NA
 8: 2010-01-07 07:00:00    NA
 9: 2010-01-07 08:00:00    NA
10: 2010-01-07 09:00:00    NA
11: 2010-01-07 10:00:00    NA
12: 2010-01-07 11:00:00    NA
13: 2010-01-07 12:00:00    NA
14: 2010-01-07 13:00:00    NA
15: 2010-01-07 14:00:00    NA
16: 2010-01-07 15:00:00    NA
17: 2010-01-07 16:00:00    NA
18: 2010-01-07 17:00:00    NA
19: 2010-01-07 18:00:00    NA
20: 2010-01-07 19:00:00    NA
21: 2010-01-07 20:00:00    NA
22: 2010-01-07 21:00:00    NA
23: 2010-01-07 22:00:00    NA
24: 2010-01-07 23:00:00    NA
25: 2010-01-08 00:00:00 -9999
26: 2010-01-08 01:00:00 -9999
27: 2010-01-08 02:00:00 -9999
28: 2010-01-08 03:00:00 -9999
29: 2010-01-08 04:00:00 -9999
30: 2010-01-08 05:00:00 -9999
31: 2010-01-08 06:00:00 -9999
32: 2010-01-08 07:00:00 -9999
33: 2010-01-08 08:00:00 -9999
34: 2010-01-08 09:00:00 -9999
35: 2010-01-08 10:00:00 -9999
36: 2010-01-08 11:00:00 -9999
37: 2010-01-08 12:00:00 -9999
38: 2010-01-08 13:00:00 -9999
39: 2010-01-08 14:00:00 -9999
40: 2010-01-08 15:00:00 -9999
41: 2010-01-08 16:00:00 -9999
42: 2010-01-08 17:00:00 -9999
43: 2010-01-08 18:00:00 -9999
44: 2010-01-08 19:00:00 -9999
45: 2010-01-08 20:00:00 -9999
46: 2010-01-08 21:00:00 -9999
47: 2010-01-08 22:00:00 -9999
48: 2010-01-08 23:00:00 -9999
49: 2010-01-09 00:00:00    NA
50: 2010-01-09 01:00:00    NA
51: 2010-01-09 02:00:00    NA
52: 2010-01-09 03:00:00    NA
53: 2010-01-09 04:00:00    NA
54: 2010-01-09 05:00:00    NA
55: 2010-01-09 06:00:00    NA
56: 2010-01-09 07:00:00    NA
57: 2010-01-09 08:00:00    NA
58: 2010-01-09 09:00:00    NA
59: 2010-01-09 10:00:00    NA
60: 2010-01-09 11:00:00    NA
61: 2010-01-09 12:00:00    NA
62: 2010-01-09 13:00:00    NA
63: 2010-01-09 14:00:00    NA
64: 2010-01-09 15:00:00    NA
65: 2010-01-09 16:00:00    NA
66: 2010-01-09 17:00:00    NA
67: 2010-01-09 18:00:00    NA
68: 2010-01-09 19:00:00    NA
69: 2010-01-09 20:00:00    NA
70: 2010-01-09 21:00:00    NA
71: 2010-01-09 22:00:00    NA
72: 2010-01-09 23:00:00    NA
          date     hour value