Question

我有一个代表成绩单的文本文件。我需要找到一种拆分方法，以便有一个字符串列表来表示每个人的讲话。所以这个

mystr = '''Bob: Hello there, how are you? 

           Alice: I am fine how are you?'''

成为这个；

mylist= ['Bob: Hello there, how are you?','Alice: I am fine how are you?']

我对正则表达式是陌生的，但要意识到这可能是一条路。问题是我想在名称不同的情况下（例如John，Paul，George，Ringo等）在许多笔录上进行迭代。将会出现的是一个单词（代表说话者），然后是冒号，然后是空格。

Answer 1

re.findall(r"\S[^:]+.*", mystr)
#-> ['Bob: Hello there, how are you? ', 'Alice: I am fine how are you?']

https://docs.python.org/3/library/re.html

Answer 2

import re
mystr = '''Bob: Hello there, how are you? 

           Alice: I am fine how are you?'''
[_.group(0).strip() for _ in re.finditer(r"\w[^:]+.*", mystr)]

#['Bob: Hello there, how are you?', 'Alice: I am fine how are you?']

如果有任何可能的冒号，那么应该比前一个更喜欢此正则表达式。

mystr = '''Bob Hello there, how are you? 

           Alice: I am fine how are you?'''
[_.group(0).strip() for _ in re.finditer(r"\w{1,}:+.*", mystr)]
#['Alice: I am fine how are you?']

根据半一致特征分割字符串

2 个答案: