R中的正则表达式:在关键字之前得到最后n个句子

时间:2017-02-28 09:14:27

标签: r regex

我试图在特定关键字(在这种情况下为“关键字”)之前获得一些句子(在这种情况下为3)。如果文字说:

Hi Stack Overflow. This is a random text. This is another text. Hello. Keyword Test test

它应该在“关键字”之前得到最后3个句子,即

This is a random text. This is another text. Hello.

这是我到目前为止的正则表达式,但它只返回Hello.

.*([A-Z].*\.\s){3}(?=Keyword).*

演示:https://regex101.com/r/JKfFLl/7

3 个答案:

答案 0 :(得分:0)

您的问题是捕获组的量词仅返回最后一组。

试试这个

((?:[^.]+\.){3})(?=\s?Keyword).*

请参阅:https://regex101.com/r/JKfFLl/8

答案 1 :(得分:0)

使用sub的另一种方法:

text <- "Hi Stack Overflow. This is a random text. This is another text. Hello. Keyword Test test"

regex <- ".*?\\s*((?:[^.]+\\.){3})\\s*Keyword.*"

sub(regex, "\\1", text, perl = TRUE)
# [1] "This is a random text. This is another text. Hello."

答案 2 :(得分:0)

尝试使用以下正则表达式

(?m)^.*?\b((?:[^.]*\.){1,3})(?=\s*[\w\s]+Keyword\b).*?$

<强> DEMO

<强> - [R

x <- c(
     'Hi Stack Overflow. This is a random text. This is another text. Hello. Keyword Test test',
     'This is a sentence. And another sentence. List of Keyword 1. some text 2. some text. Another sentence.'
    )
    gsub('(?m)^.*?\\b((?:[^.]*\\.){1,3})(?=\\s*[\\w\\s]+Keyword\\b).*?$','\\1', x, perl=T)