有效地在多个字符串分隔符上拆分python字符串

时间:2012-12-18 14:49:01

标签: python string split delimiter

假设我有一个字符串,例如 "Let's split this string into many small ones"  我希望将其拆分为thisintoones

这样输出看起来像这样:

["Let's split", "this string", "into many small", "ones"]

最有效的方法是什么?

3 个答案:

答案 0 :(得分:11)

前瞻。

>>> re.split(r'\s(?=(?:this|into|ones)\b)', "Let's split this string into many small ones")
["Let's split", 'this string', 'into many small', 'ones']

答案 1 :(得分:3)

使用re.split()

>>> re.split(r'(this|into|ones)', "Let's split this string into many small ones")
["Let's split ", 'this', ' string ', 'into', ' many small ', 'ones', '']

通过在捕获组中放置要拆分的单词,输出包括我们拆分的单词。

如果您需要删除空格,请在map(str.strip, result)输出上使用re.split()

>>> map(str.strip, re.split(r'(this|into|ones)', "Let's split this string into many small ones"))
["Let's split", 'this', 'string', 'into', 'many small', 'ones', '']

如果需要,您可以使用filter(None, result)删除任何空字符串:

>>> filter(None, map(str.strip, re.split(r'(this|into|ones)', "Let's split this string into many small ones")))
["Let's split", 'this', 'string', 'into', 'many small', 'ones']

要拆分单词但将它们附加到以下组,您需要使用前瞻断言:

>>> re.split(r'\s(?=(?:this|into|ones)\b)', "Let's split this string into many small ones")
["Let's split", 'this string', 'into many small', 'ones']

现在我们真正分裂空白,但在空白后面跟着一个完整的单词,一个在thisinto和{{1 }}

答案 2 :(得分:0)

这是一种相当懒惰的方式:

import re

def resplit(regex,s):
    current = None
    for x in regex.finditer(s):
        start = x.start()
        yield s[current:start]
        current = start
    yield s[start:]

s = "Let's split this string into many small ones"
regex = re.compile('(this|into|ones)')
print list( resplit(regex,s) )

我不确定这是否最有效,但它非常干净。

基本上,我们只是一次性迭代一次比赛。这些片段由正则表达式开始匹配的字符串(s)中的索引确定。我们只是将字符串切断,直到那一点为止,我们将该索引保存为下一个切片的起点。


至于表现,ignacio显然赢了这一轮:

9.1412050724  -- Me
3.09771895409  -- ignacio

代码:

import re

def resplit(regex,s):
    current = None
    for x in regex.finditer(s):
        start = x.start()
        yield s[current:start]
        current = start
    yield s[start:]


def me(regex,s):
    return list(resplit(regex,s))

def ignacio(regex,s):
    return regex.split("Let's split this string into many small ones")

s = "Let's split this string into many small ones"
regex = re.compile('(this|into|ones)')
regex2 = re.compile(r'\s(?=(?:this|into|ones)\b)')

import timeit
print timeit.timeit("me(regex,s)","from __main__ import me,regex,s")
print timeit.timeit("ignacio(regex2,s)","from __main__ import ignacio,regex2,s")