从Python中的2个日志文件中解析数据

时间:2013-06-11 05:58:52

标签: python parsing

我需要解析来自file1的线程号,运行号和测试号,匹配file2中的测试号并将这两个值写入一个新文件。

第一个文件包含以下内容:

com-0 thread-0 [ run-0, test-1201 ]: https://lp1.soma.sf.com/img/metaBar_sprite.png -> 200 OK, 682 bytes
com-0 thread-0 [ run-0, test-1202 ]: https://lp1.soma.sf.com/img/chattersupersprite.png?v=182-4 -> 200 OK, 40172 bytes
com-0 thread-0 [ run-0, test-1203 ]: https://lp1.soma.sf.com/img/chatter/wtdNowGradientbg.png -> 200 OK, 201 bytes
com-0 thread-0 [ run-0, test-1204 ]: https://lp1.soma.sf.com/img/chatter/wtdNowIcon_sprite.png -> 200 OK, 7280 bytes
com-0 thread-0 [ run-0, test-1205 ]: https://lp1.soma.sf/img/sprites/icons24.png -> 200 OK, 20287 bytes
com-0 thread-0 [ run-0, test-1206 ]: https://lp1.soma.sf.com/img/feeds/follow_sprite.png -> 200 OK, 2894 bytes

第二个文件包含以下内容

1 Thread, Run, Test num, Start time, Test time, Errors, HTTP response code, EPQ
2 0, 0, 1201, 1370898725367, 154, 0, 200, 2049 
3 0, 0, 1202, 1370898725523, 505, 0, 204, 0
2 0, 0, 1201, 1370898725367, 400, 0, 200, 2049 
2 0, 0, 1201, 1370898725367, 1124, 0, 200, 2049 
3 0, 0, 1202, 1370898725523, 1405, 0, 204, 0

所需的输出是:

thread-0 [ run-0, test-1201 ]: https://lp1.soma.sf.com/img/metaBar_sprite.png = [154, 400, 1124]
thread-0 [ run-0, test-1202 ]: https://lp1.soma.sf.com/img/chattersupersprite.png?v=182-4 = [505, 1405]

请帮忙。提前谢谢。

1 个答案:

答案 0 :(得分:1)

如果两个日志的结构保持不变......

log1 = [line.replace(',', '').split() for line in open('test1.txt', 'r')][:]
log2 = [line.replace(',', '').split() for line in open('test2.txt', 'r')][1:]
log3 = [] # need for combining.

这将为每个日志文件生成一个按空格分割的列表。然后,它是将键与您需要的数据进行匹配的主要内容。

# First, get the tests from the second log.
tests = {}
for line in log2:
    test = line[3] # test number
    if test not in tests:
        tests[test] = {'times': []}

    tests[test]['times'].append(line[5]) # test time

接下来,您要检查每个测试编号的第一个日志:

for line in log1:
    test = line[4].split('-')[1] # changes test-#### to ####
    if test in tests:
        tests[test].update({
            'thread': line[1],
            'run':    line[3],
            'url':    line[6],
            'times':  ', '.join(tests[test]['times'])
        })

然后,只需将测试dict重新组合到日志文件中即可。

for key, test in tests.iteritems():

    line = '{thread} [ {run}, test-{key} ]: {url} = [{times}]\n'
    line = line.format(thread=test['thread'], run=test['run'], key=key,
        url=test['url'], times=test['times'])

    log3.append(line)

with open('log3.txt', 'a') as f:
    f.write(''.join(log3))