Question

我有以下数据框：

word   sentence  
cat    the cat was red
blue   the cat was red
dog    the dogs

我想添加一个0或1的新列，具体取决于单词是否在句子中具有完全匹配，即

word   sentence          isInSentence
cat    the cat was red        1
blue   the cat was red        0
dog    the dogs               0

我发现匹配函数可以对字符串向量中的单词执行此操作。但是，当我直接申请比赛时

 ifelse(match(d$word, strsplit(d$sentence, ' '), nomatch=0) == 0, 0, 1)

它没有按预期工作。我认为它没有像我想的那样按行执行匹配操作。我也研究了grep，但是我还没有想出办法让任何一个函数去做我想做的事。

有什么建议吗？

谢谢！

Answer 1

我们可以使用str_detect中的stringr来检查“单词”是否在“句子”中。为防止子字符串匹配，我们可以在“字”的开头和结尾处paste字边界（\\b）

library(stringr)
d$isInSentence <-  as.integer(str_detect(d$sentence, paste0("\\b", d$word, "\\b")))
d$isInSentence
#[1] 1 0 0

在OP的代码中，strsplit返回list。因此，我们需要使用'word'循环相应的list元素。为此，可以使用Map/mapply。对于没有匹配，默认情况下我们获得NA。因此，可以使用logical将其转换为is.na，然后使用as.integer强制转换为整数

as.integer(!is.na(mapply(match, d$word, strsplit(d$sentence, ' '))))
#[1] 1 0 0