我有一个返回
行的shell命令timestamp=1511270820724797892 eventID=1511270820724797892 eventName="corvil_request_summary" channelID="HTTP: Other" channelDir=false classID="class-default" packetID=2809419165205232 messageOffset=1 warnCSMInvalidSample=false warnCSMOverflow=false warnEventInvalidSample=false Server="nginx/1.10.1" Method="GET" RequestURI="/system/varlogmessages/" UserAgent="Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0" WebSite="backup-server-new" Domain="backup-server-new" SrcIP="172.20.1.13" SrcPort="80" DstIP="172.18.4.181" DstPort="60065"
timestamp=1511270820735795372 eventID=1511270820735795372 eventName="corvil_request_summary" channelID="HTTP: Other" channelDir=false classID="class-default" packetID=2809419176202992 messageOffset=1 warnCSMInvalidSample=false warnCSMOverflow=false warnEventInvalidSample=false Server="probe" Method="GET" RequestURI="/system/status" WebSite="probe609:8111" Domain="probe609:8111" SrcIP="172.20.2.109" SrcPort="8111" DstIP="172.18.4.96" DstPort="49714"
我试图将其读作:
for i, row in enumerate(csv.reader(execute(cmd), delimiter=' ', skipinitialspace=True)):
print i, len(row)
if i > 10:
break
但这不能正常工作,因为引号内的空格不会被忽略。例如,由于channelID="HTTP: Other"
和HTTP:
Other
被拆分为两个变量
解析此类输入的正确方法是什么?
答案 0 :(得分:0)
这是hackish,但我觉得这里的规则类似于解析HTML标记中的属性。
from HTMLParser import HTMLParser
#from html.parser import HTMLParser # Python 3
# Create a parser that simply dumps the tag attributes to an instance variable
class AttrParser(HTMLParser):
def handle_starttag(self, tag, attrs):
self.attrs = attrs
# Our input
to_parse = 'channelID="HTTP: Other" channelDir=false classID="class-default"'
# Create a parser instance
parser = AttrParser()
# Wrap our input text inside a dummy HTML tag and feed it into the parser
parser.feed('<NOTAG {}>'.format(to_parse))
# Read the results
print(parser.attrs)
结果:
[('channelid', 'HTTP: Other'),
('channeldir', 'false'),
('classid', 'class-default')]
答案 1 :(得分:0)
正则表达式找到键然后使用键进行偏移
lines = [
'''timestamp=1511270820724797892 eventID=1511270820724797892 eventName="corvil_request_summary" channelID="HTTP: Other" channelDir=false classID="class-default" packetID=2809419165205232 messageOffset=1 warnCSMInvalidSample=false warnCSMOverflow=false warnEventInvalidSample=false Server="nginx/1.10.1" Method="GET" RequestURI="/system/varlogmessages/" UserAgent="Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0" WebSite="backup-server-new" Domain="backup-server-new" SrcIP="172.20.1.13" SrcPort="80" DstIP="172.18.4.181" DstPort="60065"''',
'''timestamp=1511270820735795372 eventID=1511270820735795372 eventName="corvil_request_summary" channelID="HTTP: Other" channelDir=false classID="class-default" packetID=2809419176202992 messageOffset=1 warnCSMInvalidSample=false warnCSMOverflow=false warnEventInvalidSample=false Server="probe" Method="GET" RequestURI="/system/status" WebSite="probe609:8111" Domain="probe609:8111" SrcIP="172.20.2.109" SrcPort="8111" DstIP="172.18.4.96" DstPort="49714"''',
]
results = []
for line in lines:
result = {}
keys = re.findall(r'\w+=', line)
for idx, k in enumerate(keys):
start = line.find(k)
if idx + 1 >= len(keys):
end = len(line)
else:
end = line.find(keys[idx+1])
key, value = line[start:end].strip().split("=")
if isinstance(value, str):
if value.lower() == "true":
value = True
elif value.lower() == "false":
value = False
elif value.isdigit():
value = int(value)
else:
value = value.strip('"')
result[key] = value
results.append(result)