Pythonic脚本忽略日志文件中的时间戳

时间:2015-09-30 10:05:18

标签: python regex

有2个日志文件:log Alog B

log A

2015-07-12 08:50:33,904 [Collection-3]INFO app -Executing Scheduled job: System: choppa1

2015-07-12 09:56:45,060 [Collection-3] INFO app - Executing Scheduled job: System: choppa1

2015-07-12 10:00:00,001 [Analytics_Worker-1] INFO  app  - Trigger for job AnBuildAuthorizationJob was fired.

2015-07-12 11:00:00,007 [Analytics_Worker-1] INFO app - Starting the AnBuildAuthorizationJob job.



log B

2014-07-12 09:50:33,904 [Collection-3] INFO  app  - Executing Scheduled job: System: choppa1

2014-07-12 09:56:45,060 [Collection-3] INFO  app  - Executing Scheduled job: System: choppa1

2014-07-12 10:00:00,001 [Analytics_Worker-1] INFO  app  - Trigger for job AnBuildAuthorizationJob was fired.

2014-07-12 10:00:00,007 [Analytics_Worker-1] INFO  app  - Starting the AnBuildAuthorizationJob job.

2个日志文件具有相同的内容,但时间戳不同。我需要通过忽略时间戳比较2个文件,即比较两个文件的每一行,即使它们有不同的时间戳,也不应该报告任何差异。我为此编写了以下python脚本:

#!/usr/bin/python
import re
import difflib

program = open("log1.txt", "r")
program_contents = program.readlines()
program.close() 

new_contents = []

pat = re.compile("^[^0-9]")

for line in program_contents:
 if re.search(pat, line):
  new_contents.append(line)

program = open("log2.txt", "r")
program_contents1 = program.readlines()
program.close() 

new_contents1 = []

pat = re.compile("^[^0-9]")

for line in program_contents1:
 if re.search(pat, line):
  new_contents1.append(line)

diff=difflib.ndiff(new_contents,new_contents1)
print(''.join(diff))

是否有更有效的方式编写上述脚本?并且上述脚本仅在时间戳位于行的开头时才起作用。我想编写一个python脚本,即使时间戳位于行中间的某个位置也应该有效。谁能帮助我怎么做?

2 个答案:

答案 0 :(得分:0)

I would  change pat = re.compile("^[^0-9]")

             to pat = re.compile("\d{4}-d{2}-d{2}

并且最好打开文件

                  with open(filename) as f:

这样python会为你关闭文件,不需要关闭(f)语句。

答案 1 :(得分:0)

这是从文件开头消除时间戳的小脚本。

program = open("log1.txt", "r")
program_contents = program.readlines()
program.close()

program = open("log2.txt", "r")
program_contents1 = program.readlines()
program.close() 

for i in range(0,len(program_contents1)):
    if program_contents[i] == '\n':
        continue
    if program_contents[i][19:] == program_contents1[i][19:]:
        print("Matches")