Question

每次出现特定类型的字体时，是否有一种简单的方法将文本拆分为单独的行。例如，我的文字看起来像这样：

BILLY: The sky is blue. SALLY: It really is blue. SAM: I think it looks like this: terrible.

我想将文本分成每个发言者的行：

BILLY: The sky is blue.
SALLY: It really is blue.
SAM: I think it looks like this: terrible.

扬声器总是大写，名字后跟冒号。

Answer 1

import re
a="BILLY: The sky is blue. SALLY: It really is blue. SAM: I think it looks like this: terrible."
print re.split(r"\s(?=[A-Z]+:)",a)

您可以使用re.split。

输出：['BILLY: The sky is blue.', 'SALLY: It really is blue.', 'SAM: I think it looks like this: terrible.']

Answer 2

如果您只想更改文本而不是列表，则可以执行以下操作：

import re

text = "BILLY: The sky is blue. SALLY: It really is blue. SAM: I think it looks like this: terrible."
print re.sub(r'([A-Z]+\:)', r'\n\1', text).lstrip()

这将打印：

BILLY: The sky is blue. 
SALLY: It really is blue. 
SAM: I think it looks like this: terrible.

在Python中拆分文本

2 个答案: