数据帧在向量上过滤并返回匹配组

时间:2017-11-30 18:53:24

标签: r filter

可能是一个简单的解决方案,但找不到优雅的方式。在下面的df中,我想根据向量找到值,不仅返回匹配,还返回匹配所在的整个组。

df <- data.frame(group= c("a","a","b","b","b","c","d","d"),
                  person = c("Tom","Jerry","Tom","Anna","Sam","Nic","Anna","Jerry"), stringsAsFactors = FALSE)

search_vector <- c("Tom","Nic")

预期输出

df_result
  group person
1     a    Tom
2     a  Jerry
3     b    Tom
4     b   Anna
5     b    Sam
6     c    Nic

当然可以分两步完成,但应该有更好的方法

df_sub <- subset(df, person %in% search_vector)
df_result <- subset(df, group %in% df_subset$group)

修改1

library(microbenchmark)
microbenchmark(

  dplyr_test= df %>% 
    group_by(group) %>%
    filter(any(person %in% search_vector)),
  base= df[ave(df$person %in% search_vector, df$group, FUN=any),],
  convoluted = df[df$group %in% df$group[df$person %in% search_vector],],
  times = 100

)


Unit: microseconds
       expr      min        lq       mean    median        uq      max neval
 dplyr_test 3191.893 3433.7885 3736.42618 3649.4145 3991.2770 5017.041   100
       base  131.175  150.0395  193.04807  184.2435  224.6185  367.780   100
 convoluted   43.726   52.0120   68.80326   61.0035   86.0395  123.770   100

3 个答案:

答案 0 :(得分:2)

我们将'group'变量分组为filter any'person'%in%'search_vector'

library(dplyr)
df %>% 
   group_by(group) %>%
   filter(any(person %in% search_vector))

答案 1 :(得分:2)

或者使用一些复杂(但有效)的索引集:

df[df$group %in% df$group[df$person %in% search_vector],]

答案 2 :(得分:2)

在基础R中,您可以将aveany一起使用,然后将结果用于逻辑索引。

df[ave(df$person %in% search_vector, df$group, FUN=any),]
  group person
1     a    Tom
2     a  Jerry
3     b    Tom
4     b   Anna
5     b    Sam
6     c    Nic