str_extract:彼此匹配的单词

时间:2018-10-28 15:26:21

标签: r regex stringr lookbehind

我想提取一个匹配dog|cat(0-5个单词,\ r,\ n或之间的空格)1.和更多文本的字符串,直到出现2.

myStrings <- c(
"the dog says: 1. hello cat 2. I do not care",
"the dog barks ba ba ba ba ba ba ba and says: 1. no 2. no",
"the doggie says: 1. hello 2. you",
"the cat is angry and asks: 1. hello dog 2. go away",
"the dog says: 2. nothing 3. nothing")

我的方法是:

str_extract(string=myStrings,pattern=regex("(dog|cat(?:\\w+\\W+){1,5}?1.).*(?=2.)"))

我尝试实现此(https://www.regular-expressions.info/near.html),但是我的正则表达式匹配

> [1] "dog says: 1. hello cat " "dog barks ba ba ba ba ba
> ba ba: 1. no " "doggie says: 1. hello " "dog " "dog says: "  

我需要的是

 > [1] "dog says: 1. hello cat " "NA" "NA" "the cat is angry and asks: 1. hello dog " "NA"

1 个答案:

答案 0 :(得分:0)

您的后向断言是无限的,这意味着它可以匹配任意数量的令牌。引擎需要静态地确定后视的长度。

顺便说一句,您的正则表达式中括号似乎不均匀,这意味着我不知道应该将哪些令牌包含在后代中。如果包含\w+之类的内容,它将不受限制。