有人可以帮我解决以下问题吗? 我有一个包含数千行的日志文件,如下所示: -
jarid: 7e5ae720-9151-11e0-eff2-00238bce4216 recv: 1 timestamp: 00:00:02,217
jarid: 7e5ae720-9151-11e0-eff2-00238bce4216 ack: 13 timestamp: 00:00:04,537
jarid: 462c6d11-9151-11e0-a72c-00238bbdc9e7 recv: 1 timestamp: 00:00:08,018
jarid: 462c6d11-9151-11e0-a72c-00238bbdc9e7 nack: 14 timestamp: 00:00:10,338
我想编写一个python脚本来迭代这个文件,并基于jarid(日志文件中的第二个字段)来获取发现jarid的每一行的时间戳,并在同一行上打印它们。例如,对于以下两行: -
jarid: 7e5ae720-9151-11e0-eff2-00238bce4216 recv: 1 timestamp: 00:00:02,217
jarid: 7e5ae720-9151-11e0-eff2-00238bce4216 ack: 13 timestamp: 00:00:04,537
我会得到以下输出: -
jarid: 7e5ae720-9151-11e0-eff2-00238bce4216 recv: 00:00:02,217 ack: 00:00:04,537
我认为实现这一目标的最好方法是使用字典(或者不是!,请注释)。我写了下面的脚本,它有点工作,但它没有给我所需的输出: -
#!/opt/SP/bin/python
log = file(/opt/SP/logs/generic.log, "r")
filecontent = log.xreadlines()
storage = {}
for line in filecontent:
line = line.strip()
jarid, JARID, status, STATUS, timestamp, TIME = line.split(" ")
if JARID not in storage:
storage[JARID] = {}
if STATUS not in storage[JARID]:
storage[JARID][STATUS] = {}
if TIME not in storage[JARID][STATUS]:
storage[JARID][STATUS][TIME] = {}
jarids = storage.keys()
jarids.sort()
for JARID in jarids:
stats = storage[JARID].keys()
stats.sort()
for STATUS in stats:
times = storage[JARID][STATUS].keys()
times.sort()
for TIME in times:
all = storage[JARID][STATUS][TIME].keys()
all.sort()
for JARID in jarids:
if "1" in storage[JARID].keys() and "13" in storage[JARID].keys():
print "MSG: %s, RECV: %s, ACK: %s" % (JARID, storage[JARID]["1"], storage[JARID]["13"])
else:
if "1" in storage[JARID].keys() and "14" in storage[JARID].keys():
print "MSG: %s, RECV: %s, NACK: %s" % (JARID, storage[JARID]["1"], storage[JARID]["14"])
当我运行此脚本时,我得到以下输出: -
MSG: 7e5ae720-9151-11e0-eff2-00238bce4216, RECV: {'00:00:02,217': {}}, ACK: {'00:00:04,537': {}}
请注意我仍然在学习python,而且我的脚本技能并不是全部!
拜托,请问我能否帮助我弄清楚如上所述如何获得所需的输出?
答案 0 :(得分:2)
基于JBernardo的回答,但使用defaultdict而不是setdefault。您可以完全相同的方式打印它,所以我不会在这里复制该代码
from collections import defaultdict
log = ['jarid: 7e5ae720-9151-11e0-eff2-00238bce4216 recv: 1 timestamp: 00:00:02,217',
'jarid: 7e5ae720-9151-11e0-eff2-00238bce4216 ack: 13 timestamp: 00:00:04,537',
'jarid: 462c6d11-9151-11e0-a72c-00238bbdc9e7 recv: 1 timestamp: 00:00:08,018',
'jarid: 462c6d11-9151-11e0-a72c-00238bbdc9e7 nack: 14 timestamp: 00:00:10,338']
d = defaultdict(dict)
for i in (line.split() for line in log):
d[i[1]][i[2]] = i[-1]
您还可以解压缩为有意义的名称。例如
for label1, jarid, jartype, x, label2, timestamp in (line.split() for line in log):
d[jarid][jartype] = timestamp
答案 1 :(得分:0)
我不会让status
成为一本字典。相反,我只会在timestamp
词典中为每个status
键存储jarid
。用一个例子更好地解释......
def search_jarids(jarid):
stored_jarid = storage[jarid]
entry = "jarid: %s" % jarid
for status in stored_jarid:
entry += " %s: %s" % (status, stored_jarid[status])
return entry
with open("yourlog.log", 'r') as log:
lines = log.readlines()
storage = {}
for line in lines:
line = line.strip()
jarid_tag, jarid, status_tag, status, timestamp_tag, timestamp = line.split(" ")
if jarid not in storage:
storage[jarid] = {}
status_tag = status_tag[:-1]
storage[jarid][status_tag] = timestamp
print search_jarids("462c6d11-9151-11e0-a72c-00238bbdc9e7")
会给你:
jarid: 462c6d11-9151-11e0-a72c-00238bbdc9e7 nack: 00:00:10,338 recv: 00:00:08,018
希望它能让你开始。
答案 2 :(得分:0)
那应该有用。的更新强>
使用:
log = ['jarid: 7e5ae720-9151-11e0-eff2-00238bce4216 recv: 1 timestamp: 00:00:02,217',
'jarid: 7e5ae720-9151-11e0-eff2-00238bce4216 ack: 13 timestamp: 00:00:04,537',
'jarid: 462c6d11-9151-11e0-a72c-00238bbdc9e7 recv: 1 timestamp: 00:00:08,018',
'jarid: 462c6d11-9151-11e0-a72c-00238bbdc9e7 nack: 14 timestamp: 00:00:10,338']
你可以这样做:
d = {}
for i in (line.split() for line in log):
d.setdefault(i[1], {}).update({i[2]:i[-1]})
#as pointed by @gnibbler, you can also use "defaultdict"
#instead of dict with "setdefault"
然后你可以打印出来:
for i,j in d.items():
print 'jarid:', i,
for k,m in j.items():
print k, m,
print
答案 3 :(得分:0)
这是一个正则表达式解决方案:
import re
pattern = re.compile(r"""jarid:\s(\S+) # save jarid to group 1
\s(recv:)\s\d+ # save 'recv:' to group 2
\stimestamp:\s(\S+) # save recv timestamp to group 3
.*?jarid:\s\1 # make sure next line has same jarid
\s(n?ack:)\s\d+ # save 'ack:' or 'nack:' to group 4
\stimestamp:\s(\S+) # save ack timestamp to group 5
""", re.VERBOSE | re.DOTALL | re.MULTILINE)
for content in pattern.finditer(log):
print " jarid: " + " ".join(content.groups())
答案 4 :(得分:0)
这个解决方案有点类似于@JBernardo,尽管我选择用正则表达式解析这些行。我现在写了它,所以我也可以发表它;可能有用。
import re
line_pattern = re.compile(
r"jarid: (?P<jarid>[a-z0-9\-]+) (?P<action>[a-z]+): (?P<status>[0-9]+) timestamp: (?P<ts>[0-9\:,]+)"
)
infile = open('/path/to/file.log')
entries = (line_pattern.match(line).groupdict() for line in infile)
events = {}
for entry in entries:
event = events.setdefault(entry['jarid'], {})
event[entry['action']] = entry['ts']
for jarid, event in events.iteritems():
ack_event = 'ack' if 'ack' in event else 'nack' if 'nack' in event else None
print 'jarid: %s recv: %s %s: %s' % (jarid, event.get('recv'), ack_event, event.get(ack_event))