Question

表示一个字符串：

 string = 'Other unwanted text here and start here: This is the first sentence.\nIt is the second one.\nNow, this is the third one.\nThis is not I want.\n'

我想提取前三个句子，

This is the first sentence.\nIt is the second one.\nNow, this is the third one.

显然，以下正则表达式不起作用：

re.search('(?<=This)(.*?)(?=\n)', string)

提取This和第三个\n之间的文本的正确表达式是什么？

谢谢。

Answer 1

您可以使用此正则表达式捕获以This文本开头的三个句子，

This(?:[^\n]*\n){3}

Demo

编辑：

Python代码，

import re

s = 'Other unwanted text here and start here: This is the first sentence.\nIt is the second one.\nNow, this is the third one.\nThis is not I want.\n'

m = re.search(r'This(?:[^\n]*\n){3}',s)
if (m):
 print(m.group())

打印

This is the first sentence.
It is the second one.
Now, this is the third one.

Answer 2

Jerry的对，正则表达式不是正确的工具，并且有很多更容易，更有效的方法来解决问题；

this = 'This is the first sentence.\nIt is the second one.\nNow, this is the third one.\nThis is not I want.\n'

print('\n'.join(this.split('\n', 3)[:-1]))

输出：

This is the first sentence.

It is the second one.

Now, this is the third one.

如果您只想练习使用正则表达式，那么按照教程进行操作会容易得多。

Answer 3

尝试以下操作：

import re

string = 'Other unwanted text here and start here: This is the first sentence.\nIt is the second one.\nNow, this is the third one.\nThis is not I want.\n'
extracted_text = re.search(r'This(.*?\n.*?\n.*?)\n', string).group(1)
print(extracted_text)

给你

 is the first sentence.
It is the second one.
Now, this is the third one.

这假设n之前缺少Now。如果您希望保留This，则可以将其移至(

Answer 4

(?s)(This.*?)(?=\nThis)

使用.使(?s)包含换行符，查找以This开头，后跟\nThis的序列。

别忘了搜索结果中的__repr__不会打印出整个匹配的字符串，因此您需要

print(re.search('(?s)(This.*?)(?=\nThis)', string)[0])

如何在Python中的两个重复关键字之间获取子字符串

4 个答案: