Question

我想做这样的事情：

import re
s = 'This is a test'
re.split('(?<= )', s)

添加这样的东西：

['This ', 'is ', 'a ', 'test']

但这不起作用。

有人能建议一种基于正则表达式拆分字符串的简单方法（我的实际代码更复杂，需要正则表达式）而不丢弃任何内容吗？

Answer 1

re.split（）的目的是定义要拆分的分隔符。虽然你会发现其他答案可以让你的案例有效，但我觉得你会更喜欢 re.findall（）

re.findall(r'(\S+\s*)', s)

给你

['This ', 'is ', 'a ', 'test']

Answer 2

您可以在此处使用regex模块。

import regex
s = 'This is a test'
print regex.split('(?<= )', s,flags=regex.VERSION1)

输出：

['This ', 'is ', 'a ', 'test']

或

import re
s = 'This is a test'
print [i for i in re.split(r'(\w+\s+)', s,) if i]

注意：0 width assertions are not supported in re module for split

Answer 3

捕获分隔符，然后将分隔符重新加入上一个单词：

>>> it = iter(re.split('( )', s)+[''])
>>> [word+delimiter for word, delimiter in zip(it, it)]
['This ', 'is ', 'a ', 'test']

Answer 4

为什么不使用re.findall？

re.findall(r"(\w+\s*)", s)

Answer 5

至少在字母字符和一个分割空间上：

[i for i in re.split('(\w+ +)',s) if i] # ['This ', 'is ', 'a ', 'test']