使用df2中passfail coluimn的条件过滤df1中的行

时间:2017-04-12 15:32:55

标签: r dataframe data.table dplyr

我有2个像这样的数据框

DF1

Measurement <- c("Length","Breadth","Height","Width")
When <- c("2017-04-07 15:19:02", "2017-02-10 09:13:10", "2017-01-13 11:45:14", "2016-11-13 21:35:24")
Fail <- c(2,3,2,3)
Pass <- c(2,2,4,2)
df1 <- data.frame(Measurement,When,Fail,Pass)
df1$When <- as.POSIXct(df1$When) 

DF2

Measurement <- c("Length","Length","Length","Length",
                 "Breadth","Breadth","Breadth","Breadth","Breadth",
                 "Height","Height","Height","Height","Height","Height",
                 "Width","Width","Width","Width","Width")
Datetime <- c("2017-04-08 15:19:02","2017-04-09 15:19:02","2017-04-09 16:19:02","2017-04-10 15:19:02",
              "2017-02-11 09:13:10","2017-02-12 09:13:10","2017-02-13 09:13:10","2017-02-14 09:13:10","2017-02-15 09:13:10",
              "2017-01-19 11:45:14","2017-01-20 11:45:14","2017-01-21 11:45:14","2017-01-23 11:45:14","2017-01-27 11:45:14","2017-01-13 11:45:14",
              "2016-11-12 21:35:24","2016-11-14 21:35:24","2016-11-17 21:35:24","2016-11-19 21:35:24","2016-11-19 23:35:24")
PassFail <- c("Fail","Fail","Pass","Pass",
              "Fail","Pass","Fail","Fail","Pass",
              "Fail","Fail","Pass","Pass","Pass","Pass",
              "Fail","Fail","Pass","Fail","Pass")
df2 <- data.frame(Measurement,Datetime,PassFail)
df2$Datetime <- as.POSIXct(df2$Datetime)

df1具有针对每次测量从df2报告的通过和失败计数。我试图使用以下条件过滤df1数据帧。

  1. 对于df1中的每一行,我想查看df2以检查前2次测量(按日期时间排序)是否连续失败。我想在df1中保留该测量行。
  2. 我还想在df2&gt;中的“日期时间”时检查以上条件。在df1中“何时”。
  3. 所需的输出将是

    Measurement                When Fail Pass
           Length 2017-04-07 15:19:02    2    2
           Height 2017-01-13 11:45:14    2    4
    

    我使用这种方式获得了df1中的计数但是根据上述逻辑无法过滤它以保留感兴趣的行。

    setDT(df1)[, When := as.POSIXct(When)]
    setDT(df2)[, Datetime := as.POSIXct(Datetime)]
    df1[df2, on=.(Measurement, Datetime > When), 
                  if (.N > 0L) as.list(table(PassFail)), by=.EACHI]
    

    有人能指出我正确的方向吗?我也想要一个快速过滤器解决方案,因为我想将它应用于更大的数据集。

1 个答案:

答案 0 :(得分:1)

只是对OP代码的一个小扩展:

df1[ 
  df2[df1, on=.(Measurement, Datetime > When), 
    all(head(x.PassFail, 2) == "Fail")
  , by=.EACHI]$V1 
]