我有一个看起来像这样的数据集:
df <- data.frame("id" = c("Alpha", "Beta", "Gamma","Alpha","Beta","Gamma","Lambda","Tau"),
"group" = c("Alpha is good", "Alpha is good", "Alpha is good", "Beta is bad", "Beta is bad","Beta is bad","Beta is bad","Beta is bad"),
"Val" = c(2,2,2,5,5,5,5,5))
当组名称与ID名称匹配时,我想过滤观察。总之,最终数据集应如下所示:
final <- data.frame("id" = c("Alpha", "Beta"),
"group" = c("Alpha is good", "Beta is bad"),
"Val" = c(2,5))
这个想法是该函数应该能够识别“ id”中的字符串是否也存在于“ group”中。
我希望这很清楚
预先感谢您的帮助
答案 0 :(得分:1)
我们可以使用矢量化的str_detect
(根据?str_detect
通过字符串和模式进行矢量化。
library(stringr)
library(dplyr(
df %>%
mutate_if(is.factor, as.character) %>%
filter(str_detect(group, id))
每个组中是否有重叠的元素
df %>%
mutate_if(is.factor, as.character) %>%
group_by(group1 = group) %>%
filter(str_detect(group, id))
答案 1 :(得分:0)
一种base R
可能是:
df[unlist(Map(grepl, df$id, df$group)), ]
id group Val
1 Alpha Alpha is good 2
5 Beta Beta is bad 5
或更优雅地使用mapply()
(基于@ r2evans的评论):
df[mapply(grepl, df$id, df$group), ]
样本数据:
df <- data.frame("id" = c("Alpha", "Beta", "Gamma","Alpha","Beta","Gamma","Lambda","Tau"),
"group" = c("Alpha is good", "Alpha is good", "Alpha is good", "Beta is bad", "Beta is bad","Beta is bad","Beta is bad","Beta is bad"),
"Val" = c(2,2,2,5,5,5,5,5),
stringsAsFactors = FALSE)