如何在Python中的两个重复关键字之间获取子字符串

时间:2019-02-27 06:40:52

标签: python regex

表示一个字符串:

 string = 'Other unwanted text here and start here: This is the first sentence.\nIt is the second one.\nNow, this is the third one.\nThis is not I want.\n'

我想提取前三个句子,

This is the first sentence.\nIt is the second one.\nNow, this is the third one.

显然,以下正则表达式不起作用:

re.search('(?<=This)(.*?)(?=\n)', string)

提取This和第三个\n之间的文本的正确表达式是什么?

谢谢。

4 个答案:

答案 0 :(得分:1)

您可以使用此正则表达式捕获以This文本开头的三个句子,

This(?:[^\n]*\n){3}

Demo

编辑:

Python代码,

import re

s = 'Other unwanted text here and start here: This is the first sentence.\nIt is the second one.\nNow, this is the third one.\nThis is not I want.\n'

m = re.search(r'This(?:[^\n]*\n){3}',s)
if (m):
 print(m.group())

打印

This is the first sentence.
It is the second one.
Now, this is the third one.

答案 1 :(得分:0)

Jerry的对,正则表达式不是正确的工具,并且有很多更容易,更有效的方法来解决问题;

this = 'This is the first sentence.\nIt is the second one.\nNow, this is the third one.\nThis is not I want.\n'

print('\n'.join(this.split('\n', 3)[:-1]))

输出:

This is the first sentence.

It is the second one.

Now, this is the third one.

如果您只想练习使用正则表达式,那么按照教程进行操作会容易得多。

答案 2 :(得分:0)

尝试以下操作:

import re

string = 'Other unwanted text here and start here: This is the first sentence.\nIt is the second one.\nNow, this is the third one.\nThis is not I want.\n'
extracted_text = re.search(r'This(.*?\n.*?\n.*?)\n', string).group(1)
print(extracted_text)

给你

 is the first sentence.
It is the second one.
Now, this is the third one.

这假设n之前缺少Now。如果您希望保留This,则可以将其移至(

答案 3 :(得分:0)

(?s)(This.*?)(?=\nThis)

使用.使(?s)包含换行符,查找以This开头,后跟\nThis的序列。

别忘了搜索结果中的__repr__不会打印出整个匹配的字符串,因此您需要

print(re.search('(?s)(This.*?)(?=\nThis)', string)[0])