根据单行中一个元素的内容过滤出多行

时间:2019-06-19 16:52:59

标签: r dataframe dplyr duplicates tidyverse

我有一个包含许多行的数据帧,这些行都是重复的值,但列dd中的值除外。

如果该非重复列中的任何一行包含值“ ACT”,我都需要删除所有与此“ ACT”行以及“ ACT”行本身匹配的行。因此,在示例代码中,我只想保留aa列中包含“ c”和“ e”的六行。

我尝试了各种带for循环的嵌套if-else,并试图以某种方式基于aa中的“ ACT”存在但仍无法根据dd中的值进行过滤了解如何摆脱单行向量匹配。

aa <- c("b","b","b","c","c","c","d","d","d","e","e","e")
bb <- c("t","t","t","w","w","w","r","r","r","s","s","s")
cc <- c(1,1,1,2,2,2,3,3,3,4,4,4)
dd <- c("CVR","ACT","CVR","CVR","CVR","CVR","ACT","CVR","CVR","CVR","CVR","CVR")

理想情况下,我正在寻找一种tidyverse解决方案,但当然可以接受。

2 个答案:

答案 0 :(得分:3)

  • 使用dplyr软件包:
library(dplyr)
df1 <- tibble(
  aa = c("b","b","b","c","c","c","d","d","d","e","e","e"),
  bb = c("t","t","t","w","w","w","r","r","r","s","s","s"),
  cc = c(1,1,1,2,2,2,3,3,3,4,4,4),
  dd = c("CVR","ACT","CVR","CVR","CVR","CVR","ACT","CVR","CVR","CVR","CVR","CVR")
)

anti_join(df1, df1[df1$dd=="ACT", ], by=c("aa","bb","cc"))
#> # A tibble: 6 x 4
#>   aa    bb       cc dd   
#>   <chr> <chr> <dbl> <chr>
#> 1 c     w         2 CVR  
#> 2 c     w         2 CVR  
#> 3 c     w         2 CVR  
#> 4 e     s         4 CVR  
#> 5 e     s         4 CVR  
#> 6 e     s         4 CVR
  • 使用data.table软件包:
library(data.table)
df2 <- data.table(
  aa = c("b","b","b","c","c","c","d","d","d","e","e","e"),
  bb = c("t","t","t","w","w","w","r","r","r","s","s","s"),
  cc = c(1,1,1,2,2,2,3,3,3,4,4,4),
  dd = c("CVR","ACT","CVR","CVR","CVR","CVR","ACT","CVR","CVR","CVR","CVR","CVR")
)

df2[!df2[dd=="ACT",], on = c("aa","bb","bb")]
#>    aa bb cc  dd
#> 1:  c  w  2 CVR
#> 2:  c  w  2 CVR
#> 3:  c  w  2 CVR
#> 4:  e  s  4 CVR
#> 5:  e  s  4 CVR
#> 6:  e  s  4 CVR

reprex package(v0.3.0)于2019-06-19创建

答案 1 :(得分:0)

您可以将向量放在data.table中,并仅在dd列中保留不包含“ ACT”的(aa,bb,cc)组。

break