我正在尝试创建一个新列,以显示我的数据框中两列中的字符串是否匹配。 This question几乎是我要问的,但我不想创建过滤条件,而是想创建一个新列来显示是否存在匹配项(TRUE或FALSE)。
这是一个示例数据框:
transcript target
he saw the dog saw
she gave them it gave
watch out for danger
real bravery brave
我想创建一个新列来显示两者之间的任何匹配项:
transcript target match
he saw the dog saw T
she gave them it gave T
watch out for danger F
real bravery brave T
我更喜欢使用dplyr(),但愿意接受其他建议!
答案 0 :(得分:3)
使用stringr::str_detect
,我们可以检查transcript
是否包含target
library(stringr)
library(dplyr)
df %>% mutate_if(is.factor, as.character) %>% #If transcript and target are character class in your df then no need to this step
mutate(match = str_detect(transcript,target))
transcript target match
1 he saw the dog saw TRUE
2 she gave them it gave TRUE
3 watch out for danger FALSE
4 real bravery brave TRUE
答案 1 :(得分:2)
您要求使用dplyr方法,但这也是使用grepl
的基本R方法:
df1$match <- mapply(grepl, df1$target, df1$transcript)
df1
transcript target match
1 he saw the dog saw TRUE
2 she gave them it gave TRUE
3 watch out for danger FALSE
4 real bravery brave TRUE
在dplyr mutate语句中使用grepl
:
df1 %>%
mutate(match = mapply(grepl, target, transcript))
transcript target match
1 he saw the dog saw TRUE
2 she gave them it gave TRUE
3 watch out for danger FALSE
4 real bravery brave TRUE
答案 2 :(得分:1)
可以选择使用dplyr::rowwise()
和grepl
来创建匹配列,如下所示:
library(dplyr)
df %>% rowwise() %>%
mutate(match = grepl(target,transcript)) %>%
as.data.frame()
# transcript target match
# 1 he saw the dog saw TRUE
# 2 she gave them it gave TRUE
# 3 watch out for danger FALSE
# 4 real bravery brave TRUE
数据:
df <- read.table(text =
"transcript target
'he saw the dog' saw
'she gave them it' gave
'watch out for' danger
'real bravery' brave",
header = TRUE, stringsAsFactors = FALSE)