Question

在word2之前寻找匹配word1，允许word1和word2之间最多5个字的分隔。例如，如果word1是apple而word2是芒果，那么pattern应该匹配＆apple; apple就像芒果一样的水果＆＃39;但不匹配＆＃39;芒果是一种类似苹果的水果。（word1之前的word2）或＆＃39;苹果和橘子是水果，如芒果＆＃39; （超过5个字）。 python中的示例正则表达式是{{1}}。什么是类似的模式和函数来识别R？

中的这种模式

Answer 1

#DATA
word1 = "apple"
word2 = "mango"
p1 = "apple is a fruit like mango"
p2 = "apple and orange are fruits, like mango"
p3 = "mango is a fruit like apple"

#FUNCTION
foo = function(word1, word2, string){
    ind2 = unlist(gregexpr(word2, string))[1]
    ind1 = unlist(gregexpr(word1, string))[1] 
    nwords = length(unlist(gregexpr(" ", substr(string, ind1, ind2))))
    if(ind2 > ind1 & nwords <= 5){
        substr(string, ind1, ind2 + nchar(word2))
    }else{
        NA
    }
}

#USAGE
foo(word1, word2, p1)
#[1] "apple is a fruit like mango"

foo(word1, word2, p2)
#[1] NA

foo(word1, word2, p3)
#[1] NA

Answer 2

这个有效。将第一个单词计为apple，这个正则表达式搜索下一个4并匹配，如果它在定义的单词限制中找到芒果。

library(stringr)
> stri <- c('apple is a fruit like mango','apple and orange are fruits, like mango','apple is not a fruit like orange or mango')
> stri_extract_all(str = stri, regex = 'apple(\\s\\w+){1,4}?.mango')

[[1]]
[1] "apple is a fruit like mango"

[[2]]
[1] NA

[[3]]
[1] NA

在R中找到允许最多n个单词分隔的单词匹配

2 个答案: