如何在python中使用多个分隔符分割字符串?

时间:2018-07-15 14:33:43

标签: python string delimiter

我想通过删除所有期望的字母字符来分割字符串。

默认情况下,split仅在单词之间按空格分隔。但是我想按所有期望的字母字符分开。如何为split添加多个定界符?

例如:

word1 = input().lower().split() 
# if you input " has 15 science@and^engineering--departments, affiliated centers, Bandar Abbas&&and Mahshahr."
#the result will be ['has', '15', 'science@and^engineering--departments,', 'affiliated', 'centers,', 'bandar', 'abbas&&and', 'mahshahr.']

但是我正在寻找这种结果:

['has', '15', 'science', 'and', 'engineering', 'departments', 'affiliated', 'centers', 'bandar', 'abbas', 'and', 'mahshahr']

2 个答案:

答案 0 :(得分:4)

为了提高性能,您应该根据标记的重复项使用正则表达式。请参阅下面的基准测试。

groupby + str.isalnum

您可以将itertools.groupbystr.isalnum结合使用,以按字母数字字符进行分组。

使用此解决方案,您不必担心会被明确指定的字符分割。

from itertools import groupby

x = " has 15 science@and^engineering--departments, affiliated centers, Bandar Abbas&&and Mahshahr."

res = [''.join(j) for i, j in groupby(x, key=str.isalnum) if i]

print(res)

['has', '15', 'science', 'and', 'engineering', 'departments',
 'affiliated', 'centers', 'Bandar', 'Abbas', 'and', 'Mahshahr']

基准测试与正则表达式

一些性能基准测试与正则表达式解决方案(在Python 3.6.5上测试):

from itertools import groupby
import re

x = " has 15 science@and^engineering--departments, affiliated centers, Bandar Abbas&&and Mahshahr."

z = x*10000
%timeit [''.join(j) for i, j in groupby(z, key=str.isalnum) if i]  # 184 ms
%timeit list(filter(None, re.sub(r'\W+', ',', z).split(',')))      # 82.1 ms
%timeit list(filter(None, re.split('\W+', z)))                     # 63.6 ms
%timeit [_ for _ in re.split(r'\W', z) if _]                       # 62.9 ms

答案 1 :(得分:2)

您可以将所有非字母数字字符替换为一个字符(我使用逗号)

s = 'has15science@and^engineering--departments,affiliatedcenters,bandarabbas&&andmahshahr.'

alphanumeric = re.sub(r'\W+', ',',s) 

然后用逗号将其分割:

splitted = alphanumeric.split(',')

编辑:

正如@DeepSpace所建议的,这可以在单个语句中完成:

splitted = re.split('\W+', s)