R函数可查找至少两个2个字符串之间匹配的单词(应用于2个字符串向量)?

时间:2019-01-24 19:16:08

标签: r stringr sapply

我有2组琴弦。此示例为Char和Char2。我正在尝试查找Char是否至少包含Char2中的2个单词(任何两个单词都可以匹配)。我还没有进入“至少2个单词”部分,但是我必须首先弄清楚每个字符串中任何单词的匹配情况。任何帮助将不胜感激。

我尝试了几种不同的方式使用Stringr软件包。请看下面。我尝试使用与Robert在此问题中回答的问题类似的解决方案:Detect multiple strings with dplyr and stringr

shopping_list <- as.data.frame(c("good apples", "bag of apples", "bag of sugar", "milk x2"))
colnames(shopping_list) <- "Char"

shopping_list2 <- as.data.frame(c("good pears", "bag of sugar", "bag of flour", "sour milk x2"))
colnames(shopping_list2) <- "Char2"

shop = cbind(shopping_list , shopping_list2)
shop$Char = as.character(shop$Char)
shop$Char2 = as.character(shop$Char2)


# First attempt
sapply(shop$Char, function(x) any(sapply(shop$Char2, str_detect, string = x)))

# Second attempt
str_detect(shop$Char, paste(shop$Char2, collapse = '|'))

我得到这些结果:

sapply(shop$Char, function(x) any(sapply(shop$Char2, str_detect, string = x)))
  good apples bag of apples  bag of sugar       milk x2 
        FALSE         FALSE          TRUE         FALSE 


str_detect(shop$Char, paste(shop$Char2, collapse = '|'))
FALSE FALSE  TRUE FALSE

但是我正在寻找以下结果:

假是是是

1)否,因为只有1个单词匹配 2)TRUE,因为两者中的“ bag of” 3)TRUE,因为两者中的“ bag of” 4)正确,因为两者中都含有“牛奶x2”

1 个答案:

答案 0 :(得分:0)

这是可以提供帮助的功能

match_test <- function (string1, string2) {
  words1 <- unlist(strsplit(string1, ' '))
  words2 <- unlist(strsplit(string2, ' '))
  common_words <- intersect(words1, words2)
  length(common_words) > 1
}

这是一个例子

string1 <- c("good apples" , "bag of apples", "bag of sugar", "milk x2")
string2 <- c("good pears" , "bag of sugar", "bag of flour", "sour milk x2")
vapply(seq_along(string1), function (k) match_test(string1[k], string2[k]), logical(1))
# [1] FALSE  TRUE  TRUE  TRUE