Question

我有一个正则表达式，可以在Python 2中完美运行：

parts = re.split(r'\s*', re.sub(r'^\s+|\s*$', '', expression)) # split expression into 5 parts

例如，此正则表达式会将表达式分为5部分，

'a * b   =     c' will be split into ['a', '*', 'b', '=', 'c'],
'11 + 12 = 23' will be split into ['11', '+', '12', '=', '23'],
'ab   - c = d' will be split into ['ab', '-', 'c', '=', 'd'],

等

但是在Python 3中，此正则表达式的工作方式大不相同，

'a * b   =     c' will be split into ['', 'a','', '*', '', 'b','', '=', '',  'c', ''],
'11 + 12 = 23' will be split into ['', '1', '1', '', '+', '', '1', '2', '', '=', '', '2', '3', ''],
'ab   - c = d' will be split into ['', 'a', 'b', '', '-', '', 'c', '', '=', '', 'd', ''],

通常，在Python 3中，一个部分中的每个字符都将被拆分为一个单独的部分，并且删除的空格（包括不存在的前导和尾随空格）将成为一个空的part（''），并将添加到该部分列表。

我认为Python 3正则表达式的行为与Python 2有很大的不同，谁能告诉我Python 3会如此变化的原因，以及像Python 2一样将表达式分成5个部分的正确正则表达式是什么？ / p>

Answer 1

Python 3.7中的re.split()中添加了零长度匹配拆分功能。当您将拆分模式更改为\s+而不是\s*时，行为将在3.7+中达到预期（在Python <3.7中保持不变）：

def parts(string)
    return re.split(r'\s+', re.sub(r'^\s+|\s*$', '', string))

测试：

>>> print(parts('a * b   =     c'))
['a', '*', 'b', '=', 'c']
>>> print(parts('ab   - c = d'))
['ab', '-', 'c', '=', 'd']
>>> print(parts('a * b   =     c'))
['a', '*', 'b', '=', 'c']
>>> print(parts('11 + 12 = 23'))
['11', '+', '12', '=', '23']

regex模块是re的替代品，具有“ V1”模式，该模式使现有模式的行为类似于Python 3.7之前的行为（请参见this answer）。 / p>

Python 2 vs 3 Regex的区别

1 个答案: