Question

我正在尝试在两种模式BB和</p>之间提取子字符串：

require("stringr")
str = "<notes>\n  <p>AA:</p>\n   <p>BB: word, otherword</p>\n    <p>Number:</p>\n    <p>Level: 1</p>\n"
str_extract(str, "BB.*?:</p>")

提取的子字符串应该是“word，otherword”，但我抓得太多了：

  [1] "BB: word, otherword</p>\n    <p>Number:</p>"

Answer 1

也许是这样的？

> gsub(".*BB: (.*?)</p>.*$", "\\1", str)
# [1] "word, otherword"

Answer 2

这是Perl正则表达式的工作。即，前瞻和后瞻性参考。在stringr中，您可以将正则表达式包装在perl函数中，如下所示：

str_extract(str, perl("(?<=BB: ).*?(?=</p>)"))
[1] "word, otherword"

您也可以使用base：

执行此操作

regmatches(str, regexpr(perl("(?<=BB: ).*?(?=</p>)"), str, perl=TRUE))
[1] "word, otherword"

R捕获从模式到模式的所有内容

2 个答案: