Question

如何使用str_match提取最后一个子字符串后的剩余字符串。

例如，对于字符串＆＃34;苹果和橘子以及带有奶油的香蕉＆＃34;，我想在最后一次出现＆＃34;之后提取该字符串的剩余部分。和＆＃34;返回＆＃34;香蕉和奶油＆＃34;。

我已经尝试了很多这个命令的替代方法，但它要么在第一个＆＃34;和＃34;之后继续返回字符串的剩余部分。或空字符串。

library(stringr)

str_match("apples and oranges and bananas with cream", "(?<= and ).*(?! and )")

    #     [,1]                             
    #[1,] "oranges and bananas with cream"

我已经搜索了StackOverflow的解决方案，并找到了一些用于javascript，Python和base R但是没有找到stringr包的。

感谢。

Answer 1

（不知道str_match。但是，基本R正则表达式应该足够了。）因为正则表达式模式匹配是＆＃34;贪婪＆＃34;，即它将搜索所有匹配和选择最后一个，它只是：

sub("^.+and ", "", "apples and oranges and bananas with cream")
#[1] "bananas with cream"

我很确定＆＃34; lubridate＆＃34; hadleyverse的一角。

然后失败：

 library(lubridate)

Attaching package: ‘lubridate’

The following object is masked from ‘package:plyr’:

    here

The following objects are masked from ‘package:data.table’:

    hour, isoweek, mday, minute, month, quarter, second, wday, week, yday, year

The following object is masked from ‘package:base’:

    date

> str_replace("apples and oranges and bananas with cream", "^.+and ", "")
Error in str_replace("apples and oranges and bananas with cream", "^.+and ",  : 
  could not find function "str_replace"

所以它不在pkg:lubridate中，而是在stringr中（据我所知，它是stringi包的一个非常轻的包装器）：

library(stringr)
 str_replace("apples and oranges and bananas with cream", "^.+and ", "")
[1] "bananas with cream"

我希望那些对非基本套餐功能提出问题的人会包括library电话，以便为受访者提供有关其工作环境的线索。

Answer 2

另一种简单方法是使用捕获组使用*SKIP what's to avoid架构的变体，即What_I_want_to_avoid|(What_I_want_to_match)：

library(stringr)
s  <- "apples and oranges and bananas with cream"
str_match(s, "^.+and (.*)")[,2]

这里的关键思想是完全忽略正则表达式引擎返回的整体匹配：垃圾桶。相反，我们只需要检查捕获组1到[,2]，它在设置时包含我们要查找的内容。也可以看看： http://www.rexegg.com/regex-best-trick.html#pseudoregex

我们可以使用基础R gsub - 函数来做类似的事情，例如

gsub("^.+and (.*)", "\\1", s, perl = TRUE)

PS：不幸的是，我们不能将What_I_want_to_avoid(*SKIP)(*FAIL)|What_I_want_to_match模式与stringi / stringr函数一起使用，因为引用的ICU regex library不包含(*SKIP)(*FAIL)动词（它们仅在PCRE中可用）。

Answer 3

如果我们需要str_match

library(stringr)
str_match("apples and oranges and bananas with cream",   ".*\\band\\s(.*)")[,2]
#[1] "bananas with cream"

或者来自stri_match_last

的stringi

library(stringi)
stri_match("apples and oranges and bananas with cream", 
         regex = ".*\\band\\s(.*)")[,2]
#[1] "bananas with cream"

使用R中的stringr查找最后一个子字符串后面的剩余字符串

3 个答案: