每次出现特定类型的字体时,是否有一种简单的方法将文本拆分为单独的行。例如,我的文字看起来像这样:
BILLY: The sky is blue. SALLY: It really is blue. SAM: I think it looks like this: terrible.
我想将文本分成每个发言者的行:
BILLY: The sky is blue.
SALLY: It really is blue.
SAM: I think it looks like this: terrible.
扬声器总是大写,名字后跟冒号。
答案 0 :(得分:11)
import re
a="BILLY: The sky is blue. SALLY: It really is blue. SAM: I think it looks like this: terrible."
print re.split(r"\s(?=[A-Z]+:)",a)
您可以使用re.split
。
输出:['BILLY: The sky is blue.', 'SALLY: It really is blue.', 'SAM: I think it looks like this: terrible.']
答案 1 :(得分:3)
如果您只想更改文本而不是列表,则可以执行以下操作:
import re
text = "BILLY: The sky is blue. SALLY: It really is blue. SAM: I think it looks like this: terrible."
print re.sub(r'([A-Z]+\:)', r'\n\1', text).lstrip()
这将打印:
BILLY: The sky is blue.
SALLY: It really is blue.
SAM: I think it looks like this: terrible.