Question

我正在构建这样的正则表达式

text_regex = re.compile(r"""(
^\n{4} #start with 4 blank lines
[^(?:\n\n\n\n)]+ #negate 4 blank lines to retrieve the content in the middle
\n{4}$ #end with 4 blank lines
)""", re.VERBOSE)

我尝试过使用格式正确的文本多次，我故意用4个空行将每个项目分开

mo = text_regex.findall(text)
In [131]: len(mo)
Out[131]: 0

如何处理[^(?:\n\n\n\n)]+以检索4个“ \ n”之间的所有内容？

Answer 1

字符集是集，因此它们不包含重复！

如果您要检索由4个换行符分隔的文本部分，可以只使用此正则表达式：

\n{4}(.*?)\n{4}

并指定re.DOTALL。不贪婪将确保您希望在“中间块”内“吞噬” 4个换行符。

但是那可能效率低下。这样做的另一种方法是使用这样的正则表达式：

\n{4}(.*\n{,3}[^\n])*\n{4}

不使用使用re.DOTALL。

反正我可能会这样做：

re.split(r'\n{4,}', text)

并有选择地删除结果末尾的最后一个空元素。

将群组设为[^（？：\ n \ n \ n \ n）] +

1 个答案: