我在将Perl正则表达式转换为Python时遇到了问题。我想要匹配的文字有以下模式:
Author(s) : Firstname Lastname Firstname Lastname Firstname Lastname Firstname Lastname
在perl中我能够匹配这个并用
提取作者/Author\(s\) :((.+\n)+?)/
当我尝试
时re.compile(r'Author\(s\) :((.+\n)+?)')
在Python中,它与第一个作者匹配两次并忽略其余部分。
有谁能解释我在这里做错了什么?
答案 0 :(得分:3)
您可以这样做:
# find lines with authors
import re
# multiline string to simulate possible input
text = '''
Stuff before
This won't be matched...
Author(s) : Firstname Lastname
Firstname Lastname
Firstname Lastname
Firstname Lastname
Other(s) : Something else we won't match
More shenanigans....
Only the author names will be matched.
'''
# run the regex to pull author lines from the sample input
authors = re.search(r'Author\(s\)\s*:\s*(.*?)^[^\s]', text, re.DOTALL | re.MULTILINE).group(1)
上面的正则表达式匹配起始文本(作者,空格,冒号,空格),它通过匹配后面以空格开头的所有行来提供下面的结果:
'''Firstname Lastname
Firstname Lastname
Firstname Lastname
Firstname Lastname
'''
然后,您可以使用以下正则表达式对这些结果中的所有作者进行分组
# grab authors from the lines
import re
authors = '''Firstname Lastname
Firstname Lastname
Firstname Lastname
Firstname Lastname
'''
# run the regex to pull a list of individual authors from the author lines
authors = re.findall(r'^\s*(.+?)\s*$', authors, re.MULTILINE)
其中列出了作者名单:
['Firstname Lastname', 'Firstname Lastname', 'Firstname Lastname', 'Firstname Lastname']
组合示例代码:
text = '''
Stuff before
This won't be matched...
Author(s) : Firstname Lastname
Firstname Lastname
Firstname Lastname
Firstname Lastname
Other(s) : Something else we won't match
More shenanigans....
Only the author names will be matched.
'''
import re
stage1 = re.compile(r'Author\(s\)\s*:\s*(.*?)^[^\s]', re.DOTALL | re.MULTILINE)
stage2 = re.compile('^\s*(.+?)\s*$', re.MULTILINE)
preliminary = stage1.search(text).group(1)
authors = stage2.findall(preliminary)
将作者设置为:
['Firstname Lastname', 'Firstname Lastname', 'Firstname Lastname', 'Firstname Lastname']
成功!
答案 1 :(得分:2)
一组只能匹配一次。因此,即使您的匹配组重复,您也只能访问上一次实际匹配。您必须一次匹配所有名称然后将它们拆分(通过换行或甚至新的正则表达式)。
答案 2 :(得分:1)
尝试
re.compile(r'Author\(s\) :((.+\n)+)')
在原始表达式中,+?
表示您希望匹配非贪婪,即最小。