Question

以下示例摘自python re documents

re.split(r'\b', 'Words, words, words.')
['', 'Words', ', ', 'words', ', ', 'words', '.']

'\ b'匹配单词开头或结尾的空字符串。这意味着，如果您运行此代码，则会产生错误。

（jupyter笔记本python 3.6）

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-128-f4d2d57a2022> in <module>
      1 reg = re.compile(r"\b")
----> 2 re.split(reg, "Words, word, word.")

/usr/lib/python3.6/re.py in split(pattern, string, maxsplit, flags)
    210     and the remainder of the string is returned as the final element
    211     of the list."""
--> 212     return _compile(pattern, flags).split(string, maxsplit)
    213 
    214 def findall(pattern, string, flags=0):

ValueError: split() requires a non-empty pattern match.

由于\ b仅匹配空字符串，所以split（）无法获得其要求的“非空”模式匹配。我已经看到与split（）和空字符串有关的各种问题。我可以看到一些人在实践中会如何做，例如问题here。答案从“只是做不到”到（较老的）“这是个错误”。

我的问题是这个

由于这仍然是python网页上的示例，应该可以吗？出血边缘释放有可能吗？
上面链接中涉及的问题 re.split(r'(?<!foo)(?=bar)', 'foobarbarbazbar')，是在2015年被问到的，仅凭re.split()就无法满足要求，这种情况仍然存在吗？

Answer 1

在Python 3.7 re中，您可以使用零长度匹配项进行拆分：

在3.7版中已更改：新增了对可以匹配空字符串的模式进行分割的支持。

还请注意

模式的空匹配仅在不与先前的空匹配相邻时才拆分字符串。

>>> re.split(r'\b', 'Words, words, words.')
['', 'Words', ', ', 'words', ', ', 'words', '.']
>>> re.split(r'\W*', '...words...')
['', '', 'w', 'o', 'r', 'd', 's', '', '']

>>> re.split(r'(\W*)', '...words...')
['', '...', '', '', 'w', '', 'o', '', 'r', '', 'd', '', 's', '...', '', '', '']

还有

re.split(r'(?<!foo)(?=bar)', 'foobarbarbazbar')

我在Python 3.7中得到了['foobar', 'barbaz', 'bar']的结果。

python re.split（）空字符串

1 个答案: