非股权加入多个日期时间

时间:2019-07-01 11:48:22

标签: r data.table lubridate

我有两个数据集,我想根据两个数据集中的datetime列将它们匹配在一起。我已经将两个日期时间都转换为POSIXct。

第一个数据集(df1)如下:

shark depth temperature   datetime    date      location
A     49.5  26.2   20/03/2018 08:00 20/03/2018    SS04
A     49.5  25.3   20/03/2018 08:02 20/03/2018    SS04
A     53.0  NA     20/03/2018 08:04 20/03/2018    SS04
A     39.5  26.5   20/03/2018 08:50 20/03/2018    Absent
A     43.0  26.2   21/03/2018 09:10 21/03/2018    Absent
A     44.5  NA     21/03/2018 10:18 21/03/2018    SS04 

为简单起见,我减少了列数,但我的实际数据集有15个变量。

第二个数据集tides是潮汐时间的列表:

date   time  t_depth t_state  t_datetime
18/03/2018 02:33  2.09  High    20/03/2018 02:33
18/03/2018 08:39  0.45   Low    20/03/2018 08:39
18/03/2018 14:47  2.14  High    20/03/2018 14:47
18/03/2018 20:54  0.41   Low    20/03/2018 20:54
19/03/2018 03:01  2.13  High    21/03/2019 03:01
19/03/2018 09:09  0.41   Low    21/03/2019 09:09

我想基于t_state在该潮汐时期内是否在df1内,将df1$datetime添加到tides$t_datetime。我还要添加对应于该潮汐状态的t_depth

我对data.table非常陌生,并且对语法感到困惑。我尝试使用

df1[ copy(tides)t_state := i.t_state, 
     on = .( datetime >= t_datetime, datetime < end)]

这不起作用,但是我不确定如何解决此问题。

理想情况下,我的输出将是:

shark depth temperature   datetime    date    location t_state t_depth
A     49.5  26.2   20/03/2018 08:00 20/03/2018  SS04     High  2.09
A     49.5  25.3   20/03/2018 08:02 20/03/2018  SS04     High  2.09
A     53.0  NA     20/03/2018 08:04 20/03/2018  SS04     High  2.09
A     39.5  26.5   20/03/2018 08:50 20/03/2018  Absent   Low   0.45
A     43.0  26.2   20/03/2018 09:10 21/03/2018  Absent   Low   0.45
A     44.5  NA     20/03/2018 10:18 21/03/2018  SS04     Low   0.45

如果可能的话,我还想知道如何添加为简单起见而省略的额外变量,是否有必要添加任何变量来解决这些问题?

谢谢!

通过数据输入的数据:

structure(list(shark = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "A", class = "factor"), 
    depth = c(49.5, 49.5, 53, 39.5, 43, 44.5), temperature = c(26.2, 
    25.3, NA, 26.5, 26.2, NA), datetime = structure(1:6, .Label = c("20/03/2018 08:00", 
    "20/03/2018 08:02", "20/03/2018 08:04", "20/03/2018 08:50", 
    "21/03/2018 09:10", "21/03/2018 10:18"), class = "factor"), 
    date = structure(c(1L, 1L, 1L, 1L, 2L, 2L), .Label = c("20/03/2018", 
    "21/03/2018"), class = "factor"), location = structure(c(2L, 
    2L, 2L, 1L, 1L, 2L), .Label = c("Absent", "SS04"), class = "factor")), class = "data.frame", row.names = c(NA, 
-6L))

structure(list(date = structure(c(1L, 1L, 1L, 1L, 2L, 2L), .Label = c("18/03/2018", 
"19/03/2018"), class = "factor"), time = structure(c(1L, 3L, 
4L, 5L, 2L, 2L), .Label = c("02:33", "03:01", "08:39", "14:47", 
"20:54"), class = "factor"), t_depth = c(2.09, 0.45, 2.14, 0.41, 
2.13, 0.41), t_state = structure(c(1L, 2L, 1L, 2L, 1L, 2L), .Label = c("High", 
"Low"), class = "factor"), t_datetime = structure(c(2L, 3L, 1L, 
4L, 5L, 6L), .Label = c(" 20/03/2018 14:47", "20/03/2018 02:33", 
"20/03/2018 08:39", "20/03/2018 20:54", "21/03/2019 03:01", "21/03/2019 09:09"
), class = "factor")), class = "data.frame", row.names = c(NA, 
-6L))

1 个答案:

答案 0 :(得分:2)

library( data.table )

#create posix-timestamp
setDT(df1)[, timestamp := as.POSIXct( datetime, format = "%d/%m/%Y %H:%M" )]
#create start and end of tidal period
setDT(tides)[, start := as.POSIXct( t_datetime, format = "%d/%m/%Y %H:%M" )]
tides[, end := shift( start, type = "lead" )]
#left update non-equi join
#left update non-equi join
df1[tides, tide:=i.t_state, on=.(timestamp>=start,timestamp<end)][,timestamp:=NULL]

   shark depth temperature         datetime       date location tide
1:     A  49.5        26.2 20/03/2018 08:00 20/03/2018     SS04 High
2:     A  49.5        25.3 20/03/2018 08:02 20/03/2018     SS04 High
3:     A  53.0          NA 20/03/2018 08:04 20/03/2018     SS04 High
4:     A  39.5        26.5 20/03/2018 08:50 20/03/2018   Absent  Low
5:     A  43.0        26.2 21/03/2018 09:10 21/03/2018   Absent  Low
6:     A  44.5          NA 21/03/2018 10:18 21/03/2018     SS04  Low

更新评论

df1[tides, `:=`(tide=i.t_state, depth = i.t_depth), on=.(timestamp>=start,timestamp<end)][,timestamp:=NULL][]