根据半一致特征分割字符串

时间:2018-09-17 15:14:24

标签: python regex

我有一个代表成绩单的文本文件。我需要找到一种拆分方法,以便有一个字符串列表来表示每个人的讲话。所以这个

mystr = '''Bob: Hello there, how are you? 

           Alice: I am fine how are you?'''

成为这个;

mylist= ['Bob: Hello there, how are you?','Alice: I am fine how are you?']

我对正则表达式是陌生的,但要意识到这可能是一条路。问题是我想在名称不同的情况下(例如John,Paul,George,Ringo等)在许多笔录上进行迭代。将会出现的是一个单词(代表说话者),然后是冒号,然后是空格。

2 个答案:

答案 0 :(得分:0)

re.findall(r"\S[^:]+.*", mystr)
#-> ['Bob: Hello there, how are you? ', 'Alice: I am fine how are you?']

https://docs.python.org/3/library/re.html

答案 1 :(得分:0)

import re
mystr = '''Bob: Hello there, how are you? 

           Alice: I am fine how are you?'''
[_.group(0).strip() for _ in re.finditer(r"\w[^:]+.*", mystr)]

#['Bob: Hello there, how are you?', 'Alice: I am fine how are you?']

如果有任何可能的冒号,那么应该比前一个更喜欢此正则表达式。

mystr = '''Bob Hello there, how are you? 

           Alice: I am fine how are you?'''
[_.group(0).strip() for _ in re.finditer(r"\w{1,}:+.*", mystr)]
#['Alice: I am fine how are you?']