我知道我可以使用类似的东西
theText='She loves music. Her favorit instrument is the piano.\n\n However, \n\n she does not play it.'
paragraphs = [p for p in theText.split('\n\n') if p]
for i,p in enumerate(paragraphs):
print(i,p)
将文本拆分为段落。
但是,我想添加一个附加条件,即下一个句子不能以小写字母开头。 实际代码提供了
0 She loves music. Her favorit instrument is the piano.
1 However,
2 she does not play it.
我想要
0 She loves music. Her favorit instrument is the piano.
1 However, she does not play it.
我相信我应该使用一些正则表达式,但是我找不到正确的结构。
答案 0 :(得分:1)
您可以使用以下正则表达式,以确保使用Lookahead \n\n
在?=
后跟一个大写字母(以及可选的空格)。另外,在您的枚举中,您将必须摆脱\n\n
(在这里使用re.sub
):
import re
paragraphs = re.split('\n\n\s?(?=[A-Z])',theText)
for i,p in enumerate(paragraphs):
print(i,re.sub('\n\n\s?','',p))
0 She loves music. Her favorit instrument is the piano.
1 However, she does not play it.