我希望解析Microsoft DNS调试日志响应。我们的想法是解析域并在调试日志中打印每个域出现的编号列表。通常我会使用grep -v " R " log > tmp
之类的东西来首先将所有响应重定向到文件。然后手动grep查找grep domain tmp
等域名。我认为有更好的方法。
20140416 01:38:52 588 PACKET 02030850 UDP Rcv 192.168.0.10 2659 R Q [8281 DR SERVFAIL] A (11)quad(3)sub(7)domain(3)com(0)
20140416 01:38:52 588 PACKET 02396370 UDP Rcv 192.168.0.5 b297 R Q [8281 DR SERVFAIL] A (3)pk(3)sub(7)domain(3)com(0)
20140415 19:46:24 544 PACKET 0261F580 UDP Snd 192.168.0.2 795a Q [0000 NOERROR] A (11)tertiary(7)domain(3)com(0)
20140415 19:46:24 544 PACKET 01A47E60 UDP Snd 192.168.0.1 f4e2 Q [0001 D NOERROR] A (11)quad(3)sub(7)domain(3)net(0)
对于上述数据,类似下面的输出会很棒:
domain.com 3
domain.net 1
这表示脚本或命令为domain.com找到了两个查询条目。我并不担心计算中包含三级或更多主机。 shell命令或Python就可以了。这里有一些伪代码可以将问题带回家。
theFile = open('log','r')
FILE = theFile.readlines()
theFile.close()
printList = []
# search for unique queries and count them
for line in FILE:
if ('query for the " Q " field' in line):
# store until count for this uniq value is complete
printList.append(line)
for item in printList:
print item # print the summary which is a number of unique domains
答案 0 :(得分:1)
也许是这样的?我不是regular expressions的专家,但是我应该完成工作,因为我理解你要解析的格式。
#!/usr/bin/env python
import re
ret = {}
with open('log','r') as theFile:
for line in theFile:
match = re.search(r'Q \[.+\].+\(\d+\)([^\(]+)\(\d+\)([^\(]+)',line.strip())
if match != None:
key = ' '.join(match.groups())
if key not in ret.keys():
ret[key] = 1
else:
ret[key] += 1
for k in ret.keys():
print '%s %d' % (k,ret[k])
答案 1 :(得分:1)
这个怎么样,有点蛮力:
>>> from collections import Counter
>>> with open('t.txt') as f:
... c = Counter('.'.join(re.findall(r'(\w+\(\d+\))',line.split()[-1])[-2:]) for line in f)
...
>>> for domain, count in c.most_common():
... print domain,count
...
domain(3).com(0) 3
domain(3).net(0) 1
答案 2 :(得分:0)
它不符合您要求的输出,但这对您有用吗?
dns = [line.strip().split()[-1] for line in file(r"path\to\file").readlines() if "PACKET" in line]
domains = {}
for d in dns:
if not domains.has_key(d):
domains[d] = 1
else:
domains[d] += 1
for k, v in domains.iteritems():
print "%s %d" % (k, v)