在Python中正确解析终端的特定输出

时间:2018-04-11 22:20:41

标签: python python-3.x

我制作了一个简单的程序,将文本输出到终端(stdout),另一个程序接收该文本并解析它以匹配我的首选输出。

第一个程序只是将这些行打印到stdout:

620_ha_1 # Version: Fortigate-620B v4.0,build0271,100330 (MR2) Virus-DB: 11.00643(2010-03-31 17: Extended DB: 11.00643(2010-03-31 17: Extreme DB: 0.00000(2003-01-01 00: IPS-DB: 2.00778(2010-03-31 12: FortiClient application signature package: 1.167(2010-04-01 10: Serial-Number: FG600B3908600705

我的第二个程序然后捕获该输出并解析它:

import subprocess
import re

infoDict = dict()

# Prints were purely for my testing purposes
process = subprocess.Popen(['python','-u', 'output.py'],stdout=subprocess.PIPE)
while True:
    output = process.stdout.readline().decode()
    if output == '' and process.poll() is not None:
        break
    if output:
        rmHost = re.sub(".*?#", "", output.strip())
        versionInfo = re.split(": ", rmHost)
        # print("VERSION: " + versionInfo[0] + ": " + versionInfo[1])
        fields = re.split(".*?#|: ", output.strip())
        name = fields[0]
        data = fields[1]
        # print("name: " + name)
        # print("data: " + data)

几乎一切都正常,但我不知道如何在循环中正确处理第一行。我试图完全摆脱输出的620_ha_1 #部分。我的最终目标是将所有内容存储在一个字典中,其中name充当键,data充当值。

TL; DR

如何正确解析显示的输出的第一行,使其与以下行的格式相同,这样我就可以将它存储在dict中。

谢谢!

1 个答案:

答案 0 :(得分:0)

计划

游戏计划是剥离主要文本,包括英镑符号。然后解析会更容易。

代码

# https://stackoverflow.com/q/49785215/459745
import re
import pprint

output = """620_ha_1 # Version: Fortigate-620B v4.0,build0271,100330 (MR2)
Virus-DB: 11.00643(2010-03-31 17:
Extended DB: 11.00643(2010-03-31 17:
Extreme DB: 0.00000(2003-01-01 00:
IPS-DB: 2.00778(2010-03-31 12:
FortiClient application signature package: 1.167(2010-04-01 10:
Serial-Number: FG600B3908600705
"""

# Regular expression to extract the keys/values from output. The
# re.VERBOSE flag allows for comment inside the regular expression
key_value_pattern = re.compile("""
    \s*      # White spaces (including new lines) preceding the key
    ([^:]+)  # The key: anything but the colon
    :        # The colon itself
    (.+)     # The value
    """, flags=re.VERBOSE)

# Remove the text up to and including the pound sign
output = output.partition('#')[-1]
infoDict = dict(key_value_pattern.findall(output))

# Show
pprint.pprint(infoDict)

输出

{'Extended DB': ' 11.00643(2010-03-31 17:',
 'Extreme DB': ' 0.00000(2003-01-01 00:',
 'FortiClient application signature package': ' 1.167(2010-04-01 10:',
 'IPS-DB': ' 2.00778(2010-03-31 12:',
 'Serial-Number': ' FG600B3908600705',
 'Version': ' Fortigate-620B v4.0,build0271,100330 (MR2)',
 'Virus-DB': ' 11.00643(2010-03-31 17:'}

注释

  • 代码不是很长,我为了帮助解释事情是如何工作而罢工
  • str.partition将字符串分成3部分:在英镑符号之前,英镑符号和英镑符号之后。我们只对英镑符号之后的部分感兴趣。
  • .findall方法将返回(键,值)列表,非常适合制作字典