我有一个数据集(df1),其中包含每个观察值的开始日期和结束日期(实际数据有约50,000个观察值)。
dateStart <- c("2018-01-23","2017-11-01","2017-11-29")
dateEnd <- c("2018-01-25","2017-11-02","2017-11-30")
obs <- c(1,2,3)
dateStart <- as.Date(as.character(dateStart), format = "%Y-%m-%d")
dateEnd <- as.Date(as.character(dateEnd), format = "%Y-%m-%d")
df1 <- data.frame(obs,dateStart,dateEnd)
df1
obs dateStart dateEnd
1 2018-01-23 2018-01-25
2 2017-11-01 2017-11-02
3 2017-11-29 2017-11-30
另一个数据集(df2)具有记录的值(这里的数据是> 150,000行数据):
datetime <- c("2018-01-23 14:30:00", "2018-01-23 15:30:00","2017-11-01 12:10:00","2017-11-01 22:59:00","2017-11-29 00:40:00", "2017-11-29 16:50:00")
value <- c(1.1,1.2,2.1,2.2,3.1,3.2)
date <- as.POSIXct(as.character(datetime), format = "%Y-%m-%d %H:%M:%S")
df2 <- data.frame(datetime,value)
df2
datetime value
2018-01-23 14:30:00 1.1
2018-01-23 15:30:00 1.2
2017-11-01 12:10:00 2.1
2017-11-01 22:59:00 2.2
2017-11-29 00:40:00 3.1
2017-11-29 16:50:00 3.2
如何插入df1中开始日期和结束日期之间出现的df2峰值?看起来应该像这样:
obs dateStart dateEnd value
1 2018-01-23 2018-01-25 1.2
2 2017-11-01 2017-11-02 2.2
3 2017-11-29 2017-11-30 3.2
我在单个数据帧中使用了子集,但不知道如何在两个数据帧的多行范围之间进行子设置。
非常感谢任何帮助。
答案 0 :(得分:2)
这是一个data.table解决方案(受this answer的启发),
library(data.table)
setDT(df1)[setDT(df2),
on = .(dateStart <= datetime, dateEnd > datetime),][, .SD[which.max(value)], by = obs][]
给出,
obs dateStart dateEnd value 1: 1 2018-01-23 2018-01-23 1.2 2: 2 2017-11-01 2017-11-01 2.2 3: 3 2017-11-29 2017-11-29 3.2
答案 1 :(得分:2)
使用sqldf
#First convert dateStart and dateEnd in df1 to POSIXct
dateStart <- as.POSIXct(as.character(dateStart))
dateEnd <- as.POSIXct(as.character(dateEnd))
library(sqldf)
sqldf("SELECT obs, df1.dateStart, df1.dateEnd, df2.date, max(df2.value) As value
FROM df2 left JOIN df1
ON df2.date BETWEEN df1.dateStart AND df1.dateEnd
group by 1") #the 3rd column in SELECT i.e. obs
obs dateStart dateEnd date value
1 NA <NA> <NA> 2017-11-28 21:40:00 3.1 #As 28 is out the interval 29-30
2 1 2018-01-23 2018-01-25 2018-01-23 12:30:00 1.2
3 2 2017-11-01 2017-11-02 2017-11-01 19:59:00 2.2
4 3 2017-11-29 2017-11-30 2017-11-29 13:50:00 3.2