Question

带有re.M的re.findall找不到我要搜索的多行

我正在尝试从文件中提取与模式匹配的所有多行字符串

文件book.txt中的示例：

Title: Le Morte D'Arthur, Volume I (of II)
       King Arthur and of his Noble Knights of the Round Table

Author: Thomas Malory

Editor: William Caxton

Release Date: March, 1998  [Etext #1251]
Posting Date: November 6, 2009

Language: English

Title: Pride and Prejudice

Author: Jane Austen

Posting Date: August 26, 2008 [EBook #1342]
Release Date: June, 1998
Last Updated: October 17, 2016

Language: English

以下代码仅返回第一行Le Morte D'Arthur, Volume I (of II)

re.findall('^Title:\s(.+)$', book, re.M)

我希望输出为

[' Le Morte D'Arthur, Volume I (of II)\n King Arthur and of his Noble Knights of the Round Table', ' Pride and Prejudice']

为了澄清，
-第二行是可选的，在某些文件中存在第二行。在第二行之后还有更多我不想阅读的文本。
-使用re.findall(r'Title: (.+\n.+)$', text, flags=re.MULTILINE)有效，但如果第二行为空白，则失败。
-我正在运行python3.7。
-我将txt文件转换为字符串，然后在str上运行re。
-以下内容也不起作用：
re.findall(r'^Title:\s(.+)$', text, re.S)
re.findall(r'^Title:\s(.+)$', text, re.DOTALL)

Answer 1

我猜可能是这个表情

(?<=Title:\s)(.*?)\s*(?=Author)

可能接近可能需要设计的内容。

DEMO

测试

import re

regex = r"(?<=Title:\s)(.*?)\s*(?=Author)"

test_str = ("Title: Le Morte D'Arthur, Volume I (of II)\n"
    "       King Arthur and of his Noble Knights of the Round Table\n\n"
    "Title: Le Morte D'Arthur, Volume I (of II)\n"
    "       King Arthur and of his Noble Knights of the Round Table")

print(re.findall(regex, test_str, re.DOTALL))

输出

["Le Morte D'Arthur, Volume I (of II)\n       King Arthur and of his Noble Knights of the Round Table\n\n", "Le Morte D'Arthur, Volume I (of II)\n       King Arthur and of his Noble Knights of the Round Table"]

Answer 2

您可以将正则表达式与DOTALL标志一起使用，以允许.匹配换行符char：

re.findall('^Title:\s(.+)$', book, re.DOTALL)

输出：

Le Morte D'Arthur, Volume I (of II)\n       King Arthur and of his Noble Knights of the Round Table

re.findall多行python

2 个答案:

DEMO

测试

输出