根据匹配的字符串模式进行过滤

时间:2019-09-27 17:44:10

标签: r filter

我有一个看起来像这样的数据集:

df <- data.frame("id" = c("Alpha", "Beta", "Gamma","Alpha","Beta","Gamma","Lambda","Tau"), 
                 "group" = c("Alpha is good", "Alpha is good", "Alpha is good", "Beta is bad", "Beta is bad","Beta is bad","Beta is bad","Beta is bad"), 
                 "Val" = c(2,2,2,5,5,5,5,5))

当组名称与ID名称匹配时,我想过滤观察。总之,最终数据集应如下所示:

final <- data.frame("id" = c("Alpha", "Beta"), 
                 "group" = c("Alpha is good", "Beta is bad"), 
                 "Val" = c(2,5))

这个想法是该函数应该能够识别“ id”中的字符串是否也存在于“ group”中。

我希望这很清楚

预先感谢您的帮助

2 个答案:

答案 0 :(得分:1)

我们可以使用矢量化的str_detect(根据?str_detect

  

通过字符串和模式进行矢量化。

library(stringr)
library(dplyr(
df %>%
  mutate_if(is.factor, as.character) %>%
  filter(str_detect(group, id))

每个组中是否有重叠的元素

df %>%
  mutate_if(is.factor, as.character) %>%
  group_by(group1 = group) %>%
  filter(str_detect(group, id))

答案 1 :(得分:0)

一种base R可能是:

df[unlist(Map(grepl, df$id, df$group)), ]

     id         group Val
1 Alpha Alpha is good   2
5  Beta   Beta is bad   5

或更优雅地使用mapply()(基于@ r2evans的评论):

df[mapply(grepl, df$id, df$group), ]

样本数据:

df <- data.frame("id" = c("Alpha", "Beta", "Gamma","Alpha","Beta","Gamma","Lambda","Tau"), 
                 "group" = c("Alpha is good", "Alpha is good", "Alpha is good", "Beta is bad", "Beta is bad","Beta is bad","Beta is bad","Beta is bad"), 
                 "Val" = c(2,2,2,5,5,5,5,5),
                 stringsAsFactors = FALSE)