我制作了一个简单的程序,将文本输出到终端(stdout),另一个程序接收该文本并解析它以匹配我的首选输出。
第一个程序只是将这些行打印到stdout:
620_ha_1 # Version: Fortigate-620B v4.0,build0271,100330 (MR2)
Virus-DB: 11.00643(2010-03-31 17:
Extended DB: 11.00643(2010-03-31 17:
Extreme DB: 0.00000(2003-01-01 00:
IPS-DB: 2.00778(2010-03-31 12:
FortiClient application signature package: 1.167(2010-04-01 10:
Serial-Number: FG600B3908600705
我的第二个程序然后捕获该输出并解析它:
import subprocess
import re
infoDict = dict()
# Prints were purely for my testing purposes
process = subprocess.Popen(['python','-u', 'output.py'],stdout=subprocess.PIPE)
while True:
output = process.stdout.readline().decode()
if output == '' and process.poll() is not None:
break
if output:
rmHost = re.sub(".*?#", "", output.strip())
versionInfo = re.split(": ", rmHost)
# print("VERSION: " + versionInfo[0] + ": " + versionInfo[1])
fields = re.split(".*?#|: ", output.strip())
name = fields[0]
data = fields[1]
# print("name: " + name)
# print("data: " + data)
几乎一切都正常,但我不知道如何在循环中正确处理第一行。我试图完全摆脱输出的620_ha_1 #
部分。我的最终目标是将所有内容存储在一个字典中,其中name
充当键,data
充当值。
TL; DR
如何正确解析显示的输出的第一行,使其与以下行的格式相同,这样我就可以将它存储在dict中。
谢谢!
答案 0 :(得分:0)
游戏计划是剥离主要文本,包括英镑符号。然后解析会更容易。
# https://stackoverflow.com/q/49785215/459745
import re
import pprint
output = """620_ha_1 # Version: Fortigate-620B v4.0,build0271,100330 (MR2)
Virus-DB: 11.00643(2010-03-31 17:
Extended DB: 11.00643(2010-03-31 17:
Extreme DB: 0.00000(2003-01-01 00:
IPS-DB: 2.00778(2010-03-31 12:
FortiClient application signature package: 1.167(2010-04-01 10:
Serial-Number: FG600B3908600705
"""
# Regular expression to extract the keys/values from output. The
# re.VERBOSE flag allows for comment inside the regular expression
key_value_pattern = re.compile("""
\s* # White spaces (including new lines) preceding the key
([^:]+) # The key: anything but the colon
: # The colon itself
(.+) # The value
""", flags=re.VERBOSE)
# Remove the text up to and including the pound sign
output = output.partition('#')[-1]
infoDict = dict(key_value_pattern.findall(output))
# Show
pprint.pprint(infoDict)
{'Extended DB': ' 11.00643(2010-03-31 17:',
'Extreme DB': ' 0.00000(2003-01-01 00:',
'FortiClient application signature package': ' 1.167(2010-04-01 10:',
'IPS-DB': ' 2.00778(2010-03-31 12:',
'Serial-Number': ' FG600B3908600705',
'Version': ' Fortigate-620B v4.0,build0271,100330 (MR2)',
'Virus-DB': ' 11.00643(2010-03-31 17:'}
str.partition
将字符串分成3部分:在英镑符号之前,英镑符号和英镑符号之后。我们只对英镑符号之后的部分感兴趣。.findall
方法将返回(键,值)列表,非常适合制作字典