Question

我收到了以下文字

Title: The Divine Comedy, Complete
        The Vision of Paradise, Purgatory and Hell

我正在使用此正则表达式来掌握后续行

(?<=Title:)[.|\n|\W|\w]*

它在regex在线构建器中运行良好，就像这个https://pythex.org/

一样

但是，我正在创建一个如下所示的正则表达式对象

 re.compile(r'(?<=Title:)[.|\n|\W|\w]*', re.IGNORECASE)

当我运行它时，我已经

了

File "./script1_c.py", line 33, in <module>
title = re.search(title_search, doc).group('title')
IndexError: no such group

我做错了什么？我应该将IGNORECASE更改为MULTILINE吗？ TIA

Answer 1

您的模式中没有任何组，无论是名称还是其他组，因此您可以返回的唯一组是0，即整个匹配。

使用数字组解决此问题：

title_search = re.compile(r'(?<=Title:)([.\n\W\w]*)', re.IGNORECASE) 
title = re.search(title_search, data).group(1)

或命名组：

title_search = re.compile(r'(?<=Title:)(?P<title>[.\n\W\w]*)', re.IGNORECASE) 
title = re.search(title_search, data).group('title')

注意，您不需要字符集中的|符号。管道符号用于表示两种模式之间的选择，但仅限于外部字符集。

最后，您可以使用re.DOTALL来简化模式：

title_search = re.compile(r'(?<=Title:)(?P<title>.*)',re.IGNORECASE | re.DOTALL)