Question

我正在尝试创建一个新列，以显示我的数据框中两列中的字符串是否匹配。 This question几乎是我要问的，但我不想创建过滤条件，而是想创建一个新列来显示是否存在匹配项（TRUE或FALSE）。

这是一个示例数据框：

 transcript        target
 he saw the dog    saw
 she gave them it  gave
 watch out for     danger
 real bravery      brave

我想创建一个新列来显示两者之间的任何匹配项：

 transcript        target    match
 he saw the dog    saw        T
 she gave them it  gave       T
 watch out for     danger     F
 real bravery      brave      T

我更喜欢使用dplyr（），但愿意接受其他建议！

Answer 1

使用stringr::str_detect，我们可以检查transcript是否包含target

library(stringr)
library(dplyr)
df %>% mutate_if(is.factor, as.character) %>%    #If transcript and target are character class  in your df then no need to this step
       mutate(match = str_detect(transcript,target))


         transcript target match
1   he saw the dog    saw  TRUE
2 she gave them it   gave  TRUE
3    watch out for danger FALSE
4     real bravery  brave  TRUE

Answer 2

您要求使用dplyr方法，但这也是使用grepl的基本R方法：

df1$match <- mapply(grepl, df1$target, df1$transcript)

df1
        transcript target match
1   he saw the dog    saw  TRUE
2 she gave them it   gave  TRUE
3    watch out for danger FALSE
4     real bravery  brave  TRUE

在dplyr mutate语句中使用grepl：

df1 %>% 
  mutate(match = mapply(grepl, target, transcript))

        transcript target match
1   he saw the dog    saw  TRUE
2 she gave them it   gave  TRUE
3    watch out for danger FALSE
4     real bravery  brave  TRUE

Answer 3

可以选择使用dplyr::rowwise()和grepl来创建匹配列，如下所示：

library(dplyr)

df %>% rowwise() %>%
  mutate(match  = grepl(target,transcript)) %>%
  as.data.frame()

#         transcript target match
# 1   he saw the dog    saw  TRUE
# 2 she gave them it   gave  TRUE
# 3    watch out for danger FALSE
# 4     real bravery  brave  TRUE

数据：

df <- read.table(text = 
"transcript        target
'he saw the dog'    saw
'she gave them it'  gave
'watch out for'     danger
'real bravery'      brave",
header = TRUE, stringsAsFactors = FALSE)

创建新列以显示dplyr中字符串的部分匹配

3 个答案: