使用条件对数据框和子集中的列进行分组

时间:2017-03-30 18:22:58

标签: r dataframe

我有一个像这样的数据框

ID <- c("ID001","ID001","ID001","ID001","ID001","ID001","ID001",
        "ID002","ID002","ID002","ID002","ID002")
Type <- c("A","A","A","A","A","A","A",
          "B","B","B","B","B")
Measurement <- c("Length","Summary","Breadth","Length","Summary","Breadth","Summary",
                 "Length","Summary","Breadth","Breadth","Summary")
PassFail <- c("PASS","PASS","PASS","FAIL_PTS","FAIL","FAIL_AVG_HI","FAIL",
              "PASS","FAIL_PTS","FAIL","FAIL_AVG_LOW","FAIL")
ToolID <- c("SWP","SWP","SWP","ISP","ISP","IKS","IKS",
            "PSX","PSX","PSX","PZY","PZY")

df <- data.frame(ID,Type,Measurement,PassFail,ToolID)
df

      ID Type Measurement     PassFail ToolID
   ID001    A      Length         PASS    SWP
   ID001    A     Summary         PASS    SWP
   ID001    A     Breadth         PASS    SWP
   ID001    A      Length     FAIL_PTS    ISP
   ID001    A     Summary         FAIL    ISP
   ID001    A     Breadth  FAIL_AVG_HI    IKS
   ID001    A     Summary         FAIL    IKS
   ID002    B      Length         PASS    PSX
   ID002    B     Summary     FAIL_PTS    PSX
   ID002    B     Breadth         FAIL    PSX
   ID002    B     Breadth FAIL_AVG_LOW    PZY
   ID002    B     Summary         FAIL    PZY

我正在尝试使用如下条件对此数据框进行子集化:当passfail =&#39; FAIL_AVG_HI&#39;或者&#39; FAIL_AVG_LOW&#39;,我想删除该组中的行(ID,类型,工具ID)。

我的所需输出看起来像这样

     ID Type Measurement PassFail ToolID
  ID001    A      Length     PASS    SWP
  ID001    A     Summary     PASS    SWP
  ID001    A     Breadth     PASS    SWP
  ID001    A      Length FAIL_PTS    ISP
  ID001    A     Summary     FAIL    ISP
  ID002    B      Length     PASS    PSX
  ID002    B     Summary FAIL_PTS    PSX
  ID002    B     Breadth     FAIL    PSX

我正在搞乱分组以删除行。我可以删除具有上述passfail值的行但是如何对它们进行分组并删除属于该组的所有行?

我这样做是为了删除1行

df <- subset(df,df$PassFail != 'FAIL_AVG_HI' | df$PassFail != 'FAIL_AVG_LOW')

2 个答案:

答案 0 :(得分:2)

您可以使用group_by %>% filter

library(dplyr)
df %>% 
      group_by(ID, Type, ToolID) %>% 
      filter(!any(PassFail %in% c('FAIL_AVG_HI', 'FAIL_AVG_LOW')))

#Source: local data frame [8 x 5]
#Groups: ID, Type, ToolID [3]

#      ID   Type Measurement PassFail ToolID
#  <fctr> <fctr>      <fctr>   <fctr> <fctr>
#1  ID001      A      Length     PASS    SWP
#2  ID001      A     Summary     PASS    SWP
#3  ID001      A     Breadth     PASS    SWP
#4  ID001      A      Length FAIL_PTS    ISP
#5  ID001      A     Summary     FAIL    ISP
#6  ID002      B      Length     PASS    PSX
#7  ID002      B     Summary FAIL_PTS    PSX
#8  ID002      B     Breadth     FAIL    PSX

答案 1 :(得分:1)

我们可以使用data.table

library(data.table)
setDT(df)[, if(!any(PassFail %in% c('FAIL_AVG_HI', 'FAIL_AVG_LOW'))) 
                  .SD, .(ID, Type, ToolID)]
#       ID Type ToolID Measurement PassFail
#1: ID001    A    SWP      Length     PASS
#2: ID001    A    SWP     Summary     PASS
#3: ID001    A    SWP     Breadth     PASS
#4: ID001    A    ISP      Length FAIL_PTS
#5: ID001    A    ISP     Summary     FAIL  
#6: ID002    B    PSX      Length     PASS
#7: ID002    B    PSX     Summary FAIL_PTS
#8: ID002    B    PSX     Breadth     FAIL