在Python中拆分文本

时间:2015-10-21 13:03:32

标签: python text split

每次出现特定类型的字体时,是否有一种简单的方法将文本拆分为单独的行。例如,我的文字看起来像这样:

BILLY: The sky is blue. SALLY: It really is blue. SAM: I think it looks like this: terrible.

我想将文本分成每个发言者的行:

BILLY: The sky is blue.
SALLY: It really is blue.
SAM: I think it looks like this: terrible.

扬声器总是大写,名字后跟冒号。

2 个答案:

答案 0 :(得分:11)

import re
a="BILLY: The sky is blue. SALLY: It really is blue. SAM: I think it looks like this: terrible."
print re.split(r"\s(?=[A-Z]+:)",a)

您可以使用re.split

输出:['BILLY: The sky is blue.', 'SALLY: It really is blue.', 'SAM: I think it looks like this: terrible.']

答案 1 :(得分:3)

如果您只想更改文本而不是列表,则可以执行以下操作:

import re

text = "BILLY: The sky is blue. SALLY: It really is blue. SAM: I think it looks like this: terrible."
print re.sub(r'([A-Z]+\:)', r'\n\1', text).lstrip()

这将打印:

BILLY: The sky is blue. 
SALLY: It really is blue. 
SAM: I think it looks like this: terrible.