最低要求

Question

最低要求

假设我有字符串as1das2das3D。我想提取从字母a到字母D的所有内容。有三个与此匹配的子字符串-我想要最短/最右边的匹配，即as3D。

我知道可以实现此目的的一个解决方案是stringr::str_extract("as1das2das3D", "a[^a]+D")

真实示例

不幸的是，我无法将其用于真实数据。在我的真实数据中，我有（可能）有两个URL的字符串，并且我试图提取一个紧跟rel=\"next\"后跟的URL。因此，在下面的示例字符串中，我想提取URL https://abc.myshopify.com/ZifQ。

foo <- "<https://abc.myshopify.com/YifQ>; rel=\"previous\", <https://abc.myshopify.com/ZifQ>; rel=\"next\""

# what I've tried
stringr::str_extract(foo, '(?<=\\<)https://.*(?=\\>; rel\\="next)')          # wrong output
stringr::str_extract(foo, '(?<=\\<)https://(?!https)+(?=\\>; rel\\="next)')  # error

Answer 1

您可以这样做：

stringr::str_extract(foo,"https:[^;]+(?=>; rel=\"next)")
[1] "https://abc.myshopify.com/ZifQ"

甚至

stringr::str_extract(foo,"https(?:(?!https).)+(?=>; rel=\"next)")
[1] "https://abc.myshopify.com/ZifQ"

Answer 2

这是一个选择吗？

将;或,上的字符串与目标字符串进行比较，并从其先前的索引中获取url。

urls <- strsplit(foo, ";\\s+|,\\s+")[[1]]
urls[which(urls == "rel=\"next\"") - 1]
#[1] "<https://abc.myshopify.com/ZifQ>"

Answer 3

这里可能是一个选择。

gsub(".+\\, <(.+)>; rel=\"next\"", "\\1", foo, perl = T)
#[1] "https://abc.myshopify.com/ZifQ"

提取最短匹配字符串正则表达式

最低要求

真实示例

3 个答案: