可能是一个简单的解决方案,但找不到优雅的方式。在下面的df
中,我想根据向量找到值,不仅返回匹配,还返回匹配所在的整个组。
df <- data.frame(group= c("a","a","b","b","b","c","d","d"),
person = c("Tom","Jerry","Tom","Anna","Sam","Nic","Anna","Jerry"), stringsAsFactors = FALSE)
search_vector <- c("Tom","Nic")
预期输出
df_result
group person
1 a Tom
2 a Jerry
3 b Tom
4 b Anna
5 b Sam
6 c Nic
当然可以分两步完成,但应该有更好的方法
df_sub <- subset(df, person %in% search_vector)
df_result <- subset(df, group %in% df_subset$group)
修改1
library(microbenchmark)
microbenchmark(
dplyr_test= df %>%
group_by(group) %>%
filter(any(person %in% search_vector)),
base= df[ave(df$person %in% search_vector, df$group, FUN=any),],
convoluted = df[df$group %in% df$group[df$person %in% search_vector],],
times = 100
)
Unit: microseconds
expr min lq mean median uq max neval
dplyr_test 3191.893 3433.7885 3736.42618 3649.4145 3991.2770 5017.041 100
base 131.175 150.0395 193.04807 184.2435 224.6185 367.780 100
convoluted 43.726 52.0120 68.80326 61.0035 86.0395 123.770 100
答案 0 :(得分:2)
我们将'group'变量分组为filter
any
'person'%in%
'search_vector'
library(dplyr)
df %>%
group_by(group) %>%
filter(any(person %in% search_vector))
答案 1 :(得分:2)
或者使用一些复杂(但有效)的索引集:
df[df$group %in% df$group[df$person %in% search_vector],]
答案 2 :(得分:2)
在基础R中,您可以将ave
与any
一起使用,然后将结果用于逻辑索引。
df[ave(df$person %in% search_vector, df$group, FUN=any),]
group person
1 a Tom
2 a Jerry
3 b Tom
4 b Anna
5 b Sam
6 c Nic