使用re.findall,我想提取分配给每个PCR的值。
>>> z
'PCR-09: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r\nPCR-10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r\nPCR-11: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r\nPCR-12: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r\nPCR-13: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r\nPCR-14: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r\nPCR-15: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r\nPCR-16: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r\n
>>> print z
PCR-09: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
PCR-10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
PCR-11: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
PCR-12: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
PCR-13: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
PCR-14: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
PCR-15: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
PCR-16: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
最初,我试过这个,但是有人可以指出正则表达式使用了什么问题吗?
>>> re.search('PCR-09:(.*?)', z).groups()
('',)
非贪婪的expr (.*?)
是否应该匹配所有字符,直到找到换行符?
通过稍加修改的正则表达式,我得到了所需的结果:
>>> re.search('PCR-09:(.*?)\s\r\n', z).groups()
(' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00',)
在同一行上,这不起作用:
>>> re.findall(r'(PCR-\d+):(.*?)', z)
[('PCR-09', ''), ('PCR-10', ''), ('PCR-11', ''), ('PCR-12', ''), ('PCR-13', ''), ('PCR-14', ''), ('PCR-15', ''), ('PCR-16', ''),
但这样做:
>>> re.findall(r'(PCR-\d+):(.*?)\s\r\n', z,re.DOTALL)
[('PCR-09', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00'), ('PCR-10', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00'), ('PCR-11', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00'), ('PCR-12', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00'), ('PCR-13', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00'), ('PCR-14', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00'), ('PCR-15', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00'), ('PCR-16', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00'),
希望有人能解释我的方法有什么问题。
由于
答案 0 :(得分:3)
r'PCR-09:(.*?)'
与您的预期不符的原因是非贪婪的正则表达式一旦有效就会停止。
因此(.*?)
可以匹配''
,因此正则表达式会立即停止。
相比之下,r'(PCR-\d+):(.*?)\s\r\n'
非贪婪,但因为它需要找到`\ s \ r \ n',它会强制扩展工作。
我建议使用贪婪的正则表达式,其中只包含您希望找到的字符:r'(PCR-\d+):([0-9 ]*)'
。
答案 1 :(得分:2)
模式PCR-09:(.*?)
告诉Python在PCR-09:
之后非贪婪地匹配零个或多个字符。所以,它恰好这样做并匹配零个字符。
你需要让你的正则表达式贪心才能匹配换行符之前的所有内容:
>>> re.search('PCR-09:(.*)', z).groups()
(' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r',)
>>>
请注意,您的PCR-09:(.*?)\s\r\n
模式有效,因为它告诉Python在PCR-09:
和之后获得零个或多个字符 \s\r\n
。换句话说,获取它们之间的所有内容。
答案 2 :(得分:0)
尝试使用:split
[ x.split(':') for x in z.split('\r\n')]
输出:
[['PCR-09', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '], ['PCR-10', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '], ['PCR-11', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '], ['PCR-12', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '], ['PCR-13', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '], ['PCR-14', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '], ['PCR-15', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '], ['PCR-16', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '], ['']]
使用正则表达式
re.findall('(PCR-\d+)(.*)',z)