Question

我的文本包含很多行。

我想根据以特定字符结尾的字符串进行拆分。

例如：我的文字包含以下数据

Hi
I'm here:
London
UK
USA
Where are you:
here 
there
what will you do:
something
somethin2

我想将此文本拆分为以定界符作为以

结尾的字符串的列表

冒号-：

在这种情况下，最终结果列表将是 [ Hi, London UK USA, here there, something somethin2 ] 我该如何在python中做到这一点？

我知道我们可以用单个字符或其他常见分隔符的字符串进行分割。但是在这种情况下该怎么办？

Answer 1

您可以使用itertools.groupby：

import itertools
data = [[a, list(b)] for a, b in itertools.groupby(content.split('\n'), key=lambda x:x.endswith(':'))]
final_result = [' '.join(b) for a, b in data if not a]

输出：

['Hi', 'London UK USA', 'here there', 'something somethin2']

Answer 2

这是一个如何完成此操作的小例子。

注意：比@ Ajax1234的答案更容易理解，但效率却低得多。

text = '''Hi
I'm here:
London
UK
USA
Where are you:
here 
there
what will you do:
something
somethin2'''

# add comma if there is ':' or else insert the line
output = [line.strip() if ':' not in line else ',' for line in text.split('\n')] 

# join the list on space
output = ' '.join(output) 

# split back into list on ',' and trim the white spaces
output = [item.strip() for item in output.split(',')]

print(output)

输出：

['Hi', 'London UK USA', 'here there', 'something somethin2']

Answer 3

您可以使用正则表达式拆分：

>>> import re
>>> [s.strip().replace('\n',' ') for s in re.split(r'^.*:$',txt, flags=re.M)] 
['Hi', 'London UK USA', 'here there', 'something somethin2']

正则表达式^.*:$查找以:结尾的完整行

Demo

然后re.splits拆分该模式上的字符串并删除定界线。然后在每个字符串块中将\n替换为' '，您将获得所需的输出。

如何根据以Python中特定字符结尾的定界符字符串拆分字符串列表？

3 个答案: