我有两个数据集,我想根据两个数据集中的datetime列将它们匹配在一起。我已经将两个日期时间都转换为POSIXct。
第一个数据集(df1
)如下:
shark depth temperature datetime date location
A 49.5 26.2 20/03/2018 08:00 20/03/2018 SS04
A 49.5 25.3 20/03/2018 08:02 20/03/2018 SS04
A 53.0 NA 20/03/2018 08:04 20/03/2018 SS04
A 39.5 26.5 20/03/2018 08:50 20/03/2018 Absent
A 43.0 26.2 21/03/2018 09:10 21/03/2018 Absent
A 44.5 NA 21/03/2018 10:18 21/03/2018 SS04
为简单起见,我减少了列数,但我的实际数据集有15个变量。
第二个数据集tides
是潮汐时间的列表:
date time t_depth t_state t_datetime
18/03/2018 02:33 2.09 High 20/03/2018 02:33
18/03/2018 08:39 0.45 Low 20/03/2018 08:39
18/03/2018 14:47 2.14 High 20/03/2018 14:47
18/03/2018 20:54 0.41 Low 20/03/2018 20:54
19/03/2018 03:01 2.13 High 21/03/2019 03:01
19/03/2018 09:09 0.41 Low 21/03/2019 09:09
我想基于t_state
在该潮汐时期内是否在df1
内,将df1$datetime
添加到tides$t_datetime
。我还要添加对应于该潮汐状态的t_depth
。
我对data.table非常陌生,并且对语法感到困惑。我尝试使用
df1[ copy(tides)t_state := i.t_state,
on = .( datetime >= t_datetime, datetime < end)]
这不起作用,但是我不确定如何解决此问题。
理想情况下,我的输出将是:
shark depth temperature datetime date location t_state t_depth
A 49.5 26.2 20/03/2018 08:00 20/03/2018 SS04 High 2.09
A 49.5 25.3 20/03/2018 08:02 20/03/2018 SS04 High 2.09
A 53.0 NA 20/03/2018 08:04 20/03/2018 SS04 High 2.09
A 39.5 26.5 20/03/2018 08:50 20/03/2018 Absent Low 0.45
A 43.0 26.2 20/03/2018 09:10 21/03/2018 Absent Low 0.45
A 44.5 NA 20/03/2018 10:18 21/03/2018 SS04 Low 0.45
如果可能的话,我还想知道如何添加为简单起见而省略的额外变量,是否有必要添加任何变量来解决这些问题?
谢谢!
通过数据输入的数据:
structure(list(shark = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "A", class = "factor"),
depth = c(49.5, 49.5, 53, 39.5, 43, 44.5), temperature = c(26.2,
25.3, NA, 26.5, 26.2, NA), datetime = structure(1:6, .Label = c("20/03/2018 08:00",
"20/03/2018 08:02", "20/03/2018 08:04", "20/03/2018 08:50",
"21/03/2018 09:10", "21/03/2018 10:18"), class = "factor"),
date = structure(c(1L, 1L, 1L, 1L, 2L, 2L), .Label = c("20/03/2018",
"21/03/2018"), class = "factor"), location = structure(c(2L,
2L, 2L, 1L, 1L, 2L), .Label = c("Absent", "SS04"), class = "factor")), class = "data.frame", row.names = c(NA,
-6L))
structure(list(date = structure(c(1L, 1L, 1L, 1L, 2L, 2L), .Label = c("18/03/2018",
"19/03/2018"), class = "factor"), time = structure(c(1L, 3L,
4L, 5L, 2L, 2L), .Label = c("02:33", "03:01", "08:39", "14:47",
"20:54"), class = "factor"), t_depth = c(2.09, 0.45, 2.14, 0.41,
2.13, 0.41), t_state = structure(c(1L, 2L, 1L, 2L, 1L, 2L), .Label = c("High",
"Low"), class = "factor"), t_datetime = structure(c(2L, 3L, 1L,
4L, 5L, 6L), .Label = c(" 20/03/2018 14:47", "20/03/2018 02:33",
"20/03/2018 08:39", "20/03/2018 20:54", "21/03/2019 03:01", "21/03/2019 09:09"
), class = "factor")), class = "data.frame", row.names = c(NA,
-6L))
答案 0 :(得分:2)
library( data.table )
#create posix-timestamp
setDT(df1)[, timestamp := as.POSIXct( datetime, format = "%d/%m/%Y %H:%M" )]
#create start and end of tidal period
setDT(tides)[, start := as.POSIXct( t_datetime, format = "%d/%m/%Y %H:%M" )]
tides[, end := shift( start, type = "lead" )]
#left update non-equi join
#left update non-equi join
df1[tides, tide:=i.t_state, on=.(timestamp>=start,timestamp<end)][,timestamp:=NULL]
shark depth temperature datetime date location tide
1: A 49.5 26.2 20/03/2018 08:00 20/03/2018 SS04 High
2: A 49.5 25.3 20/03/2018 08:02 20/03/2018 SS04 High
3: A 53.0 NA 20/03/2018 08:04 20/03/2018 SS04 High
4: A 39.5 26.5 20/03/2018 08:50 20/03/2018 Absent Low
5: A 43.0 26.2 21/03/2018 09:10 21/03/2018 Absent Low
6: A 44.5 NA 21/03/2018 10:18 21/03/2018 SS04 Low
df1[tides, `:=`(tide=i.t_state, depth = i.t_depth), on=.(timestamp>=start,timestamp<end)][,timestamp:=NULL][]