我有两个带有POSIXct格式的时间数据的数据帧和一个需要匹配的对应位置。一个数据集在一系列30分钟的时段中包含时间以及位置数据。
location datetimes date shark
SS04 2018-03-20 08:00:00 2018-03-20 A
Absent 2018-03-20 08:30:00 2018-03-20 A
Absent 2018-03-20 09:00:00 2018-03-20 A
Absent 2018-03-20 09:30:00 2018-03-20 A
SS04 2018-03-20 10:00:00 2018-03-20 A
Absent 2018-03-20 10:30:00 2018-03-20 A
第二个数据集每2分钟记录一次时间数据。
shark depth temperature datetime date
A 49.5 26.2 20/03/2018 08:00 20/03/2018
A 49.5 25.3 20/03/2018 08:02 20/03/2018
A 53.0 24.2 20/03/2018 08:04 20/03/2018
A 39.5 26.5 20/03/2018 08:28 20/03/2018
A 43.0 26.2 20/03/2018 09:10 20/03/2018
A 44.5 26.5 20/03/2018 10:34 20/03/2018
我需要根据位置数据将第一个数据集中的时间段(日期时间)与第二个数据集中的时间数据(日期时间)进行匹配,以便第二个数据集中的所有数据都对应于第一个数据集中的时间段数据集具有在30分钟内分配给所有值的位置值。
我认为我可以使用data.table,但是我对如何实现这一点不自信。
理想情况下,我想创建一个这样的数据集,并根据第一个数据集的相应时间段将第一个数据集的位置添加到第二个数据集。
shark depth temperature datetime date location
A 49.5 26.2 20/03/2018 08:00 20/03/2018 SS04
A 49.5 25.3 20/03/2018 08:02 20/03/2018 SS04
A 53.0 24.2 20/03/2018 08:04 20/03/2018 SS04
A 39.5 26.5 20/03/2018 08:32 20/03/2018 Absent
A 43.0 26.2 20/03/2018 09:10 20/03/2018 Absent
A 44.5 26.5 20/03/2018 10:18 20/03/2018 SS04
答案 0 :(得分:1)
data30min$datetimesE <- data30min$datetimes + 30 * 60 #in_seconds
library(sqldf)
sqldf('select d2.*,d30.location
from data2min d2
left join data30min d30
on d2.datetime between d30.datetimes and d30.datetimesE
')
#> shark depth temperature datetime date location
#> 1 A 49.5 26.2 2018-03-20 08:00:00 20/03/2018 SS04
#> 2 A 49.5 25.3 2018-03-20 08:02:00 20/03/2018 SS04
#> 3 A 53.0 24.2 2018-03-20 08:04:00 20/03/2018 SS04
#> 4 A 39.5 26.5 2018-03-20 08:28:00 20/03/2018 SS04
#> 5 A 43.0 26.2 2018-03-20 09:10:00 20/03/2018 Absent
#> 6 A 44.5 26.5 2018-03-20 10:34:00 20/03/2018 Absent
数据:
data2min <- structure(list(shark = c("A", "A", "A", "A", "A", "A"), depth = c(49.5,
49.5, 53, 39.5, 43, 44.5), temperature = c(26.2, 25.3, 24.2,
26.5, 26.2, 26.5), datetime = structure(c(1521547200, 1521547320,
1521547440, 1521548880, 1521551400, 1521556440), class = c("POSIXct",
"POSIXt"), tzone = ""), date = c("20/03/2018", "20/03/2018",
"20/03/2018", "20/03/2018", "20/03/2018", "20/03/2018")), row.names = c(NA,
-6L), class = "data.frame")
data30min <- structure(list(location = c("SS04", "Absent", "Absent", "Absent",
"SS04", "Absent"), datetimes = structure(c(1521547200, 1521549000,
1521550800, 1521552600, 1521554400, 1521556200), class = c("POSIXct",
"POSIXt"), tzone = ""), date = c("2018-03-20", "2018-03-20",
"2018-03-20", "2018-03-20", "2018-03-20", "2018-03-20"), shark = c("A",
"A", "A", "A", "A", "A"), datetimesE = structure(c(1521549000,
1521550800, 1521552600, 1521554400, 1521556200, 1521558000), class = c("POSIXct",
"POSIXt"))), row.names = c(NA, -6L), class = "data.frame")
答案 1 :(得分:0)
使用data.table非等式联接
样本数据
library( data.table)
DT1 <- fread('
location datetimes date shark
SS04 "2018-03-20 08:00:00" 2018-03-20 A
Absent "2018-03-20 08:30:00" 2018-03-20 A
Absent "2018-03-20 09:00:00" 2018-03-20 A
Absent "2018-03-20 09:30:00" 2018-03-20 A
SS04 "2018-03-20 10:00:00" 2018-03-20 A
Absent "2018-03-20 10:30:00" 2018-03-20 A')
DT2 <- fread('
shark depth temperature datetime date
A 49.5 26.2 "20/03/2018 08:00" 20/03/2018
A 49.5 25.3 "20/03/2018 08:02" 20/03/2018
A 53.0 24.2 "20/03/2018 08:04" 20/03/2018
A 39.5 26.5 "20/03/2018 08:28" 20/03/2018
A 43.0 26.2 "20/03/2018 09:10" 20/03/2018
A 44.5 26.5 "20/03/2018 10:34" 20/03/2018
')
DT1[, `:=`( datetimes = as.POSIXct( datetimes, format = "%Y-%m-%d %H:%M:%S" ))]
DT2[, `:=`( datetime = as.POSIXct( datetime, format = "%d/%m/%Y %H:%M" ) )]
代码
DT2[ copy(DT1)[, end := datetimes + lubridate::minutes(30)], location := i.location,
on = .( datetime >= datetimes, datetime < end)][]
输出
# shark depth temperature datetime date location
# 1: A 49.5 26.2 2018-03-20 08:00:00 20/03/2018 SS04
# 2: A 49.5 25.3 2018-03-20 08:02:00 20/03/2018 SS04
# 3: A 53.0 24.2 2018-03-20 08:04:00 20/03/2018 SS04
# 4: A 39.5 26.5 2018-03-20 08:28:00 20/03/2018 SS04
# 5: A 43.0 26.2 2018-03-20 09:10:00 20/03/2018 Absent
# 6: A 44.5 26.5 2018-03-20 10:34:00 20/03/2018 Absent