re.findall多行python

时间:2019-07-18 15:01:01

标签: python regex

带有re.M的re.findall找不到我要搜索的多行

我正在尝试从文件中提取与模式匹配的所有多行字符串

文件book.txt中的示例:

Title: Le Morte D'Arthur, Volume I (of II)
       King Arthur and of his Noble Knights of the Round Table

Author: Thomas Malory

Editor: William Caxton

Release Date: March, 1998  [Etext #1251]
Posting Date: November 6, 2009

Language: English

Title: Pride and Prejudice

Author: Jane Austen

Posting Date: August 26, 2008 [EBook #1342]
Release Date: June, 1998
Last Updated: October 17, 2016

Language: English

以下代码仅返回第一行Le Morte D'Arthur, Volume I (of II)

re.findall('^Title:\s(.+)$', book, re.M)

我希望输出为

[' Le Morte D'Arthur, Volume I (of II)\n King Arthur and of his Noble Knights of the Round Table', ' Pride and Prejudice']

为了澄清,
-第二行是可选的,在某些文件中存在第二行。在第二行之后还有更多我不想阅读的文本。
-使用re.findall(r'Title: (.+\n.+)$', text, flags=re.MULTILINE)有效,但如果第二行为空白,则失败。
-我正在运行python3.7。
-我将txt文件转换为字符串,然后在str上运行re
-以下内容也不起作用:
re.findall(r'^Title:\s(.+)$', text, re.S)
re.findall(r'^Title:\s(.+)$', text, re.DOTALL)

2 个答案:

答案 0 :(得分:1)

我猜可能是这个表情

(?<=Title:\s)(.*?)\s*(?=Author)

可能接近可能需要设计的内容。

DEMO

测试

import re

regex = r"(?<=Title:\s)(.*?)\s*(?=Author)"

test_str = ("Title: Le Morte D'Arthur, Volume I (of II)\n"
    "       King Arthur and of his Noble Knights of the Round Table\n\n"
    "Title: Le Morte D'Arthur, Volume I (of II)\n"
    "       King Arthur and of his Noble Knights of the Round Table")

print(re.findall(regex, test_str, re.DOTALL))

输出

["Le Morte D'Arthur, Volume I (of II)\n       King Arthur and of his Noble Knights of the Round Table\n\n", "Le Morte D'Arthur, Volume I (of II)\n       King Arthur and of his Noble Knights of the Round Table"]

答案 1 :(得分:1)

您可以将正则表达式与DOTALL标志一起使用,以允许.匹配换行符char:

re.findall('^Title:\s(.+)$', book, re.DOTALL)

输出:

Le Morte D'Arthur, Volume I (of II)\n       King Arthur and of his Noble Knights of the Round Table