我遇到了python 2.7上的解析问题,让我解释一下:
我正在解析incapsula
API中的事件。目标是使它们在Excel表格中可读,以制作统计数据和图表。
在签名字段上,您可以阅读事件/攻击的类型和编号。 这个数字包括攻击次数,所以我决定将每一行乘以'signature ='字段后相应的攻击数之和。
喜欢这个捕捉:
visit_id=324001290181618591, src_country=Ukraine, event_timestamp=1484493309742, src_ip=91.223.133.30, dest_name=www.xxx.com, dest_id=1551642, signature={api.threats.sql_injection=3}
visit_id=324001290181618591, src_country=Ukraine, event_timestamp=1484493309742, src_ip=91.223.133.30, dest_name=www.xxx.com, dest_id=1551642, signature={api.threats.sql_injection=3}
visit_id=324001290181618591, src_country=Ukraine, event_timestamp=1484493309742, src_ip=91.223.133.30, dest_name=www.xxx.com, dest_id=1551642, signature={api.threats.sql_injection=3}
到目前为止,一切都按预期进行,我得到了正确的攻击次数。
BUT
在一些罕见的事件中,它们是签名字段上的多个值,如此捕获:
visit_id=324001290181618591, src_country=Ukraine, event_timestamp=1484493309742, src_ip=91.223.133.30, dest_name=www.xxx.com, dest_id=1551642, signature={api.threats.sql_injection=3, api.threats.bot_access_control=1, api.threats.illegal_resource_access=1, api.threats.cross_site_scripting=1,}
visit_id=324001290181618591, src_country=Ukraine, event_timestamp=1484493309742, src_ip=91.223.133.30, dest_name=www.xxx.com, dest_id=1551642, signature={api.threats.sql_injection=3, api.threats.bot_access_control=1, api.threats.illegal_resource_access=1, api.threats.cross_site_scripting=1,}
visit_id=324001290181618591, src_country=Ukraine, event_timestamp=1484493309742, src_ip=91.223.133.30, dest_name=www.xxx.com, dest_id=1551642, signature={api.threats.sql_injection=3, api.threats.bot_access_control=1, api.threats.illegal_resource_access=1, api.threats.cross_site_scripting=1,}
visit_id=324001290181618591, src_country=Ukraine, event_timestamp=1484493309742, src_ip=91.223.133.30, dest_name=www.xxx.com, dest_id=1551642, signature={api.threats.sql_injection=3, api.threats.bot_access_control=1, api.threats.illegal_resource_access=1, api.threats.cross_site_scripting=1,}
visit_id=324001290181618591, src_country=Ukraine, event_timestamp=1484493309742, src_ip=91.223.133.30, dest_name=www.xxx.com, dest_id=1551642, signature={api.threats.sql_injection=3, api.threats.bot_access_control=1, api.threats.illegal_resource_access=1, api.threats.cross_site_scripting=1,}
visit_id=324001290181618591, src_country=Ukraine, event_timestamp=1484493309742, src_ip=91.223.133.30, dest_name=www.xxx.com, dest_id=1551642, signature={api.threats.sql_injection=3, api.threats.bot_access_control=1, api.threats.illegal_resource_access=1, api.threats.cross_site_scripting=1,}
visit_id=86001060468746692, src_country=Netherlands, event_timestamp=1483867285054, src_ip=178.22.232.53, dest_name=www.yyy.com, dest_id=1551642, signature={api.threats.bot_access_control=1, api.threats.illegal_resource_access=3,}
visit_id=86001060468746692, src_country=Netherlands, event_timestamp=1483867285054, src_ip=178.22.232.53, dest_name=www.yyy.com, dest_id=1551642, signature={api.threats.bot_access_control=1, api.threats.illegal_resource_access=3,}
visit_id=86001060468746692, src_country=Netherlands, event_timestamp=1483867285054, src_ip=178.22.232.53, dest_name=www.yyy.com, dest_id=1551642, signature={api.threats.bot_access_control=1, api.threats.illegal_resource_access=3,}
visit_id=86001060468746692, src_country=Netherlands, event_timestamp=1483867285054, src_ip=178.22.232.53, dest_name=www.yyy.com, dest_id=1551642, signature={api.threats.bot_access_control=1, api.threats.illegal_resource_access=3,}
我仍然对这些罕见的线路进行了正确的攻击,但我想从此安排签名字段:
signature={api.threats.sql_injection=3, api.threats.bot_access_control=1, api.threats.illegal_resource_access=1, api.threats.cross_site_scripting=1,}
signature={api.threats.sql_injection=3, api.threats.bot_access_control=1, api.threats.illegal_resource_access=1, api.threats.cross_site_scripting=1,}
signature={api.threats.sql_injection=3, api.threats.bot_access_control=1, api.threats.illegal_resource_access=1, api.threats.cross_site_scripting=1,}
signature={api.threats.sql_injection=3, api.threats.bot_access_control=1, api.threats.illegal_resource_access=1, api.threats.cross_site_scripting=1,}
signature={api.threats.sql_injection=3, api.threats.bot_access_control=1, api.threats.illegal_resource_access=1, api.threats.cross_site_scripting=1,}
signature={api.threats.sql_injection=3, api.threats.bot_access_control=1, api.threats.illegal_resource_access=1, api.threats.cross_site_scripting=1,}
signature={api.threats.bot_access_control=1, api.threats.illegal_resource_access=3,}
signature={api.threats.bot_access_control=1, api.threats.illegal_resource_access=3,}
signature={api.threats.bot_access_control=1, api.threats.illegal_resource_access=3,}
signature={api.threats.bot_access_control=1, api.threats.illegal_resource_access=3,}
对此:
signature={api.threats.sql_injection}
signature={api.threats.sql_injection}
signature={api.threats.sql_injection}
signature={api.threats.bot_access_control}
signature={api.threats.illegal_resource_access}
signature={api.threats.cross_site_scripting}
signature={api.threats.bot_access_control}
signature={api.threats.illegal_resource_access}
signature={api.threats.illegal_resource_access}
signature={api.threats.illegal_resource_access}
(前六行是第一个重复6次的事件(3 + 1 + 1 + 1 = 6),最后4个是重复4次的第二个事件(1 + 3 = 4)
我目前的源代码:
#count the number of attack per line
f = open('monthlyLogShort.txt','r')
g = open("count.txt", 'w')
kensu = f.readlines()
f.close()
for line in kensu:
st = line.find('signature=')
end = line.find('}')
unprecise = line[st:end+1]
#count = int(re.search(r'\d+', unprecise).group())
count = sum(map(int,re.findall(r'[0-9]+', unprecise)))
print >> g, count
g.close()
#replicate lines according to the number of attack
h = open('flog.txt','w')
with open("monthlyLogShort.txt") as textfile1, open("count.txt") as textfile2:
for x, y in izip(textfile1, textfile2):
x = x.strip()
y = y.strip()
print >> h, x * int(y)
h.close()
答案 0 :(得分:1)
如果我正确阅读了您的要求,您会尝试为每个威胁发生一次发出一行,同时保留其余记录。此解决方案不直接输出计数,而是转换数据,使其每行均匀一个威胁。
<强>代码:强>
sig_str = 'signature={'
for line in kensu:
record, signature = line.split(sig_str)
threats = signature.split('}')[0]
for counts in threats.split(','):
if '=' in counts:
threat, count = tuple(counts.split('='))
for i in range(int(count)):
print '%s%s%s}' % (record, sig_str, threat.strip())
示例数据:
kensu = [x.strip() for x in """
record=0, signature={api.threats.sql_injection=1}
record=1, signature={api.threats.sql_injection=3, api.threats.bot_access_control=1, api.threats.illegal_resource_access=1, api.threats.cross_site_scripting=1,}
record=2, signature={api.threats.bot_access_control=1, api.threats.illegal_resource_access=3,}
""".split('\n')[1:-1]]
<强>输出:强>
record=0, signature={api.threats.sql_injection}
record=1, signature={api.threats.sql_injection}
record=1, signature={api.threats.sql_injection}
record=1, signature={api.threats.sql_injection}
record=1, signature={api.threats.bot_access_control}
record=1, signature={api.threats.illegal_resource_access}
record=1, signature={api.threats.cross_site_scripting}
record=2, signature={api.threats.bot_access_control}
record=2, signature={api.threats.illegal_resource_access}
record=2, signature={api.threats.illegal_resource_access}
record=2, signature={api.threats.illegal_resource_access}