Question

我正在尝试使用subprocess和re用Python（3.7.4）包装ping。

stdout函数中的subprocess是字节数组，因此我必须更改正则表达式类型以匹配大小写。

    import subprocess,re

    out = subprocess.run(['ping', '-c', '1', '8.8.8.8'], capture_output=True)
    print(out.stdout)
    match = re.match(br'P(..)G', out.stdout, re.DOTALL | re.MULTILINE)
    if match:
        print(match.groups())

    match = re.match(br'trans(.)', out.stdout, re.DOTALL | re.MULTILINE)
    if match:
        print(match.groups())

ping命令的实际输出：

b'PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.\n64 bytes from 8.8.8.8: icmp_seq=1 ttl=53 time=60.7 ms\n\n--- 8.8.8.8 ping statistics ---\n1 packets transmitted, 1 received, 0% packet loss, time 0ms\nrtt min/avg/max/mdev = 60.665/60.665/60.665/0.000 ms\n'

match.groups的第一个输出：

(b'IN',)

第二个为空（应为(b'm',)），实际上，第一个\n之后的所有内容都无法匹配。

请注意，我有re.MULTILINE，使用str或str()转换为.decode()对输出没有任何影响。

检查了几种不同的在线工具，它们都起作用了，有什么想法吗？

Answer 1

使用match时，匹配从第一个位置开始，您的变量不是以trans开头，这就是为什么它没有匹配的原因，请使用.*?trans(.)来指示该trans位于文本中间，但我认为您应该使用搜索：

   match = re.search(br'trans(.)', out.stdout)

注意：

re.DOTALL仅在您想在\n中包含.时使用，这意味着.将匹配包括\n的任何字符。
re.MULTILINE默认^匹配文本的开头，$匹配文本的结尾，但是当您使用此标志编译REGEX时， ^将匹配行的开头，$将匹配行的结尾（\n）。

您所遇到的问题是匹配工作的方式，请检查此示例：

import re

pattern = r'HELLO (\w+)'

print(re.match(pattern, 'HELLO X').groups())  # work fine because the text start with HELLO 
m = re.match(pattern, 'CHELLO X')
print(m is None)  # didn't mach because the Text didn't start with HELLO

match如果您未指定Hello后面带有某些字符，则从第一个位置开始匹配。

解释DOTALL：

import re

text = '\nHELLO X'
pattern = re.compile(r'.*?HELLO (\w+)')
pattern_dotall = re.compile(r'.*?HELLO (\w+)', re.DOTALL)

print(re.match(pattern, text) is None)  # True: . don't match \n
print(re.match(pattern_dotall, text) is None)  # False: here is included

Python re.match仅在第一个\ n之前匹配

1 个答案: