如果至少有一个组成员满足条件,则从data.frame中删除组

时间:2015-07-27 19:32:09

标签: r subset plyr

我有data.frame,如果他们的任何成员符合条件,我想删除整个群组。

在第一个示例中,如果值为数字且条件为NA,则下面的代码有效。

df <- structure(list(world = c(1, 2, 3, 3, 2, NA, 1, 2, 3, 2), place = c(1, 
1, 2, 2, 3, 3, 1, 2, 3, 1), group = c(1, 1, 1, 2, 2, 2, 3, 
3, 3, 3)), .Names = c("world", "place", "group"), row.names = c(NA, 
-10L), class = "data.frame")

ans <- ddply(df, . (group), summarize, code=mean(world))
ans$code[is.na(ans$code)] <- 0
ans2 <- merge(df,ans)
final.ans <- ans2[ans2$code !=0,]

但是,如果条件不是&#34; ddply&#34;或者值为非NA,那么NAAF机动将不起作用-numeric。

例如,如果我想删除任何拥有 world 值为ddply的成员的组(如下面的data.frame中所示),那么df2 <-structure(list(world = structure(c(1L, 2L, 3L, 3L, 3L, 5L, 1L, 4L, 2L, 4L), .Label = c("AB", "AC", "AD", "AE", "AF"), class = "factor"), place = c(1, 1, 2, 2, 3, 3, 1, 2, 3, 1), group = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 3)), .Names = c("world", "place", "group"), row.names = c(NA, -10L), class = "data.frame") 个技巧不行。

code

我可以设想一个for循环,其中为每个组检查每个成员的值,如果满足条件,则可以填充testing.mov列,然后我可以根据该代码创建一个子集

但是,也许有一种矢量化的方法可以做到这一点吗?

4 个答案:

答案 0 :(得分:12)

尝试

library(dplyr)
df2 %>%
  group_by(group) %>%
  filter(!any(world == "AF"))

或者按照@akrun的说法:

setDT(df2)[, if(!any(world == "AF")) .SD, group]

或者

setDT(df2)[, if(all(world != "AF")) .SD, group]

给出了:

#Source: local data frame [7 x 3]
#Groups: group
#
#  world place group
#1    AB     1     1
#2    AC     1     1
#3    AD     2     1
#4    AB     1     3
#5    AE     2     3
#6    AC     3     3
#7    AE     1     3

答案 1 :(得分:8)

备用data.table解决方案:

setDT(df2)
df2[!(group %in% df2[world == "AF",group])]

给出:

   world place group
1:    AB     1     1
2:    AC     1     1
3:    AD     2     1
4:    AB     1     3
5:    AE     2     3
6:    AC     3     3
7:    AE     1     3

使用键我们可以更快一些:

setkey(df2,group) 
df2[!J((df2[world == "AF",group]))]

答案 2 :(得分:4)

基础套餐:

df2[df2$group != df2[df2$world=='AF', 3],]

输出:

   world place group
1     AB     1     1
2     AC     1     1
3     AD     2     1
7     AB     1     3
8     AE     2     3
9     AC     3     3
10    AE     1     3

使用sqldf

library(sqldf)
sqldf("SELECT df2.world, df2.place, [group] FROM df2 
      LEFT JOIN
      (SELECT  * FROM df2 WHERE world LIKE 'AF') AS t
      USING([group])
      WHERE t.world IS NULL")

输出:

  world place group
1    AB     1     1
2    AC     1     1
3    AD     2     1
4    AB     1     3
5    AE     2     3
6    AC     3     3
7    AE     1     3

答案 3 :(得分:0)

使用ave

的Base R选项
df2[with(df2, ave(world != "AF", group, FUN = all)),]

#   world place group
#1     AB     1     1
#2     AC     1     1
#3     AD     2     1
#7     AB     1     3
#8     AE     2     3
#9     AC     3     3
#10    AE     1     3

或者我们也可以使用subset

subset(df2, ave(world != "AF", group, FUN = all))

以上也可以写为

df2[with(df2, !ave(world == "AF", group, FUN = any)),]

subset(df2, !ave(world == "AF", group, FUN = any))