Question

import re
string = "some text \n\n\nError on the field: more\n text and lines\n\n\nError on the field: some more\n lines \n\n\nError on the field: final lines"
pieces = re.split(r'(Error on the field:)', string, re.IGNORECASE)
pieces
['some text \n\n\n', 'Error on the field:', ' more\n text and lines\n\n\n', 'Error on the field:', ' some more\n lines \n\n\nError on the field: final lines']
pieces2 = re.split(r'(Error on the field:)', pieces[4], re.IGNORECASE)
pieces2
[' some more\n lines \n\n\n', 'Error on the field:', ' final lines']

为什么在'Error on the field:'的初始分割中没有选择pieces的第三次分割，但是在分割pieces[4]时会被选中？

Answer 1

re.split的位置参数是：

正则表达式
字符串
maxsplit（默认值：无限制）
标志（默认值：无标志）

split(pattern, string, maxsplit=0, flags=0)

您将re.IGNORECASE（标志的值为2）作为maxsplit参数（作为 postional ）传递，这解释了奇怪的效果。它可以在某种程度上起作用，然后按照2次分割后的指示停止分割。

只需改为flags=re.IGNORECASE（关键字，而非位置），然后就可以了。

在re.compile中，您可以将该标记安全地传递给compile(pattern, flags=0)，re.match和re.search也是如此，但不是re.split {1}}＆amp; re.sub，所以它很容易陷入陷阱。如有疑问，请始终使用pass-by-keyword作为可选参数。

Answer 2

使用re.split时，您需要声明使用flags=明确使用标记：

import re
string = "some text \n\n\nError on the field: more\n text and lines\n\n\nError on the field: some more\n lines \n\n\nError on the field: final lines"
pieces = re.split(r'(Error on the field:)', string, flags=re.I)

print(pieces)

<强>输出：

['some text \n\n\n', 'Error on the field:', ' more\n text and lines\n\n\n', 'Error on the field:', ' some more\n lines \n\n\n', 'Error on the field:', ' final lines']

N.B。 re.I与re.IGNORECASE相同

python re.split不适用于所有字段

2 个答案: