合并具有日期时间条件的2个数据帧并获取passfails的计数

时间:2017-03-15 21:13:39

标签: r dataframe dplyr

我有2个像这样的数据框

DF1

ID <- c("ID001","ID001","ID002","ID003")
Type <- c("A","A","B","A")
Measurement <- c("Length","Breadth","Length","Length")
When <- c("2016-09-09 06:00:13", "2016-09-19 09:13:10", "2016-10-13 11:45:14", "2016-10-29 11:56:00")

df1 <- data.frame(ID,Type,Measurement,When)

DF2

    ID <- c("ID001","ID001","ID001","ID001","ID001",
            "ID002","ID002","ID002","ID002","ID002")
    Type <- c("A","A","A","A","A",
              "B","B","B","B","B")
    Measurement <- c("Length","Length","Length","Length","Length",
                     "Length","Length","Length","Length","Length")
    Datetime <- c("2016-09-09 01:00:13", "2016-09-09 04:00:13", "2016-09-09 09:00:13", "2016-09-09 21:00:13","2016-09-09 23:00:13",
                  "2016-10-13 10:45:14", "2016-10-13 11:15:14", "2016-10-13 11:48:14", "2016-10-13 11:55:14","2016-10-13 21:45:14")
    PassFail <- c("Pass","Fail","Pass","Fail","Pass",
                  "Fail","Fail","Pass","Pass","Pass")

    df2 <- data.frame(ID,Type,Measurement,Datetime,PassFail)

我正在尝试合并这两个数据帧以获取通过计数,并且仅针对df2中大于“WHEN”中的“Datetime”进行测量失败。

我想要的输出是

    ID Type Measurement                When PassCount FailCount
  ID001    A      Length 2016-09-09 06:00:13         2         1
  ID002    B      Length 2016-10-13 11:45:14         3         0

我尝试使用sqldf来获取此

library(sqldf)
df3<-sqldf("SELECT L.*, r.Datetime, r.PASSFAIL
            FROM df1 as L
            LEFT JOIN df2 as r
            ON L.ID=r.ID
            AND L.Type=r.Type
            AND L.Measurement=r.Measurement
            WHERE r.Datetime > L.When
            ORDER BY L.When")

我没有成功获得输出。有人能指出我正确的方向吗?我也想要一个快速合并解决方案,因为我想将它应用于更大的数据集。

1 个答案:

答案 0 :(得分:4)

使用data.table,非equi连接似乎有效:

library(data.table)
setDT(df1)[, When := as.POSIXct(When)]
setDT(df2)[, Datetime := as.POSIXct(Datetime)]

df2[df1, on=.(ID, Datetime > When), if (.N > 0L) as.list(table(PassFail)), by=.EACHI]

#       ID            Datetime Fail Pass
# 1: ID001 2016-09-09 06:00:13    1    2
# 2: ID002 2016-10-13 11:45:14    0    3

如果要为df1的每一行添加一行,请删除if子句。

将计数作为列添加到df1

df1[, levels(df2$PassFail) := 
  df2[df1, on=.(ID, Datetime > When), as.list(table(PassFail)), by=.EACHI][, !c("ID","Datetime")]
]