我有2个像这样的数据框
DF1
ID <- c("ID001","ID001","ID002","ID003")
Type <- c("A","A","B","A")
Measurement <- c("Length","Breadth","Length","Length")
When <- c("2016-09-09 06:00:13", "2016-09-19 09:13:10", "2016-10-13 11:45:14", "2016-10-29 11:56:00")
df1 <- data.frame(ID,Type,Measurement,When)
DF2
ID <- c("ID001","ID001","ID001","ID001","ID001",
"ID002","ID002","ID002","ID002","ID002")
Type <- c("A","A","A","A","A",
"B","B","B","B","B")
Measurement <- c("Length","Length","Length","Length","Length",
"Length","Length","Length","Length","Length")
Datetime <- c("2016-09-09 01:00:13", "2016-09-09 04:00:13", "2016-09-09 09:00:13", "2016-09-09 21:00:13","2016-09-09 23:00:13",
"2016-10-13 10:45:14", "2016-10-13 11:15:14", "2016-10-13 11:48:14", "2016-10-13 11:55:14","2016-10-13 21:45:14")
PassFail <- c("Pass","Fail","Pass","Fail","Pass",
"Fail","Fail","Pass","Pass","Pass")
df2 <- data.frame(ID,Type,Measurement,Datetime,PassFail)
我正在尝试合并这两个数据帧以获取通过计数,并且仅针对df2中大于“WHEN”中的“Datetime”进行测量失败。
我想要的输出是
ID Type Measurement When PassCount FailCount
ID001 A Length 2016-09-09 06:00:13 2 1
ID002 B Length 2016-10-13 11:45:14 3 0
我尝试使用sqldf来获取此
library(sqldf)
df3<-sqldf("SELECT L.*, r.Datetime, r.PASSFAIL
FROM df1 as L
LEFT JOIN df2 as r
ON L.ID=r.ID
AND L.Type=r.Type
AND L.Measurement=r.Measurement
WHERE r.Datetime > L.When
ORDER BY L.When")
我没有成功获得输出。有人能指出我正确的方向吗?我也想要一个快速合并解决方案,因为我想将它应用于更大的数据集。
答案 0 :(得分:4)
使用data.table,非equi连接似乎有效:
library(data.table)
setDT(df1)[, When := as.POSIXct(When)]
setDT(df2)[, Datetime := as.POSIXct(Datetime)]
df2[df1, on=.(ID, Datetime > When), if (.N > 0L) as.list(table(PassFail)), by=.EACHI]
# ID Datetime Fail Pass
# 1: ID001 2016-09-09 06:00:13 1 2
# 2: ID002 2016-10-13 11:45:14 0 3
如果要为df1
的每一行添加一行,请删除if
子句。
将计数作为列添加到df1
:
df1[, levels(df2$PassFail) :=
df2[df1, on=.(ID, Datetime > When), as.list(table(PassFail)), by=.EACHI][, !c("ID","Datetime")]
]