我有一个代表成绩单的文本文件。我需要找到一种拆分方法,以便有一个字符串列表来表示每个人的讲话。所以这个
mystr = '''Bob: Hello there, how are you?
Alice: I am fine how are you?'''
成为这个;
mylist= ['Bob: Hello there, how are you?','Alice: I am fine how are you?']
我对正则表达式是陌生的,但要意识到这可能是一条路。问题是我想在名称不同的情况下(例如John,Paul,George,Ringo等)在许多笔录上进行迭代。将会出现的是一个单词(代表说话者),然后是冒号,然后是空格。
答案 0 :(得分:0)
re.findall(r"\S[^:]+.*", mystr)
#-> ['Bob: Hello there, how are you? ', 'Alice: I am fine how are you?']
答案 1 :(得分:0)
import re
mystr = '''Bob: Hello there, how are you?
Alice: I am fine how are you?'''
[_.group(0).strip() for _ in re.finditer(r"\w[^:]+.*", mystr)]
#['Bob: Hello there, how are you?', 'Alice: I am fine how are you?']
如果有任何可能的冒号,那么应该比前一个更喜欢此正则表达式。
mystr = '''Bob Hello there, how are you?
Alice: I am fine how are you?'''
[_.group(0).strip() for _ in re.finditer(r"\w{1,}:+.*", mystr)]
#['Alice: I am fine how are you?']