R-获取第一次出现和最后一次出现之间的子字符串

时间:2020-07-09 05:58:00

标签: r string

我正在处理R中的长字符串,例如:

string <- "end of section. 3. LESSONS. The previous LESSONS are very important as seen in Figure 1. This text is also important. Figure 1: Blah blah blah".

我想提取第一次出现的“ LESSONS”和最后一次出现的“ Figure 1”之间的子字符串,如下所示:

"The previous LESSONS are very important as seen in Figure 1. This text is also important."

我尝试了以下操作,但是它在最后一次出现“ LESSONS”而不是第一次出现之后返回子字符串:

gsub(".*LESSONS (.*) Figure 1.*", "\\1", string)
#[1] "are very important as seen in Figure 1. This text is also important."

也尝试了以下方法,但是它在第一次出现“图1”之后而不是最后一次出现后剪切了字符串:

library(qdapRegex)
ex_between(string, "LESSONS", "Figure 1")
#[[1]]
#[1] ". The previous LESSONS are very important as seen in"

我将不胜感激!

2 个答案:

答案 0 :(得分:0)

您非常亲密。使"LESSONS"之前的正则表达式不贪心,使其与第一个匹配。

此外,在这里您只能使用sub而不是gsub

sub(".*?LESSONS\\.\\s*(.*) Figure 1.*", "\\1", string)
#[1] "The previous LESSONS are very important as seen in Figure 1. This text is also important."

答案 1 :(得分:0)

您可以使用包str_extract中的stringr以及(?<=...)中的正向和(?=...)中的正向来定义字符串中那些划定要提取的部分:

str_extract(string, "(?<=LESSONS\\.\\s).*(?=\\sFigure 1)")
[1] "The previous LESSONS are very important as seen in Figure 1. This text is also important."