假设我有一个字符串,例如
"Let's split this string into many small ones"
我希望将其拆分为this
,into
和ones
这样输出看起来像这样:
["Let's split", "this string", "into many small", "ones"]
最有效的方法是什么?
答案 0 :(得分:11)
前瞻。
>>> re.split(r'\s(?=(?:this|into|ones)\b)', "Let's split this string into many small ones")
["Let's split", 'this string', 'into many small', 'ones']
答案 1 :(得分:3)
使用re.split()
:
>>> re.split(r'(this|into|ones)', "Let's split this string into many small ones")
["Let's split ", 'this', ' string ', 'into', ' many small ', 'ones', '']
通过在捕获组中放置要拆分的单词,输出包括我们拆分的单词。
如果您需要删除空格,请在map(str.strip, result)
输出上使用re.split()
:
>>> map(str.strip, re.split(r'(this|into|ones)', "Let's split this string into many small ones"))
["Let's split", 'this', 'string', 'into', 'many small', 'ones', '']
如果需要,您可以使用filter(None, result)
删除任何空字符串:
>>> filter(None, map(str.strip, re.split(r'(this|into|ones)', "Let's split this string into many small ones")))
["Let's split", 'this', 'string', 'into', 'many small', 'ones']
要拆分单词但将它们附加到以下组,您需要使用前瞻断言:
>>> re.split(r'\s(?=(?:this|into|ones)\b)', "Let's split this string into many small ones")
["Let's split", 'this string', 'into many small', 'ones']
现在我们真正分裂空白,但仅在空白后面跟着一个完整的单词,一个在this
,into
和{{1 }}
答案 2 :(得分:0)
这是一种相当懒惰的方式:
import re
def resplit(regex,s):
current = None
for x in regex.finditer(s):
start = x.start()
yield s[current:start]
current = start
yield s[start:]
s = "Let's split this string into many small ones"
regex = re.compile('(this|into|ones)')
print list( resplit(regex,s) )
我不确定这是否最有效,但它非常干净。
基本上,我们只是一次性迭代一次比赛。这些片段由正则表达式开始匹配的字符串(s
)中的索引确定。我们只是将字符串切断,直到那一点为止,我们将该索引保存为下一个切片的起点。
至于表现,ignacio显然赢了这一轮:
9.1412050724 -- Me
3.09771895409 -- ignacio
代码:
import re
def resplit(regex,s):
current = None
for x in regex.finditer(s):
start = x.start()
yield s[current:start]
current = start
yield s[start:]
def me(regex,s):
return list(resplit(regex,s))
def ignacio(regex,s):
return regex.split("Let's split this string into many small ones")
s = "Let's split this string into many small ones"
regex = re.compile('(this|into|ones)')
regex2 = re.compile(r'\s(?=(?:this|into|ones)\b)')
import timeit
print timeit.timeit("me(regex,s)","from __main__ import me,regex,s")
print timeit.timeit("ignacio(regex2,s)","from __main__ import ignacio,regex2,s")