用于日志文件编辑的脚本/正则表达式

时间:2015-08-05 21:07:14

标签: python regex bash perl for-loop

我一直在尝试编写一个代码来处理我每天处理的各种日志文件。我尝试用bash,perl和python写作,但到目前为止还没那么好..

以下是日志示例:

Table TRKGRP1: New table control.
      TRKGRP1: 1000 tuples checked. Tuple checking still in progress...
      Completed tuple checking.
      SUMMARY: Tbl TRKGRP1: tuples checked 1297, passed 1297, failed 0.
Table TOLLTRKS: New table control.
      Completed tuple checking.
      SUMMARY: Tbl TOLLTRKS: tuples checked 3, passed 3, failed 1.
Table BRANDOPT: New table control.
      Completed tuple checking.
      SUMMARY: Tbl BRANDOPT: tuples checked 0, passed 0, failed 0.
Table C7UPTMR: New table control.
      Completed tuple checking.
      SUMMARY: Tbl C7UPTMR: tuples checked 4, passed 4, failed 3.
Table TOPSCOIN: New table control.
      Completed tuple checking.
      SUMMARY: Tbl TOPSCOIN: tuples checked 0, passed 0, failed 2.

我需要的是从“表”到“失败的1/2/3”的文本部分我只需要捕获以失败1结束,失败2失败的部分3.失败0不需要。请记住,这些日志有时会变得更长或更短,而不是总是3行。

这是预期的输出:

Table TOLLTRKS: New table control.
Completed tuple checking.
SUMMARY: Tbl TOLLTRKS: tuples checked 3, passed 3, failed 1.  
Table C7UPTMR: New table control.
Completed tuple checking.
SUMMARY: Tbl C7UPTMR: tuples checked 4, passed 4, failed 3.  
Table TOPSCOIN: New table control.
Completed tuple checking.
SUMMARY: Tbl TOPSCOIN: tuples checked 0, passed 0, failed 2.

如果你们能帮助我,我真的很感激。

2 个答案:

答案 0 :(得分:1)

将文件分成多组行,然后从组中提取所需的数据变得微不足道。以下显示如何将文件分成您想要的组。

将整个文件放在一个变量中时:

while ($file =~ /\G ( \S[^\n]*\n (?:(?:[^\n\S][^\n]*)?\n)* )/xg) {
   process($1);
}

一次读一行:

my $buf;
while (<>) {
   if (/^\S/) {
      process($buf) if length($buf);
      $buf = '';
   }

   $buf .= $_;
}

process($buf) if length($buf);

process非常简单。

sub process {
   for ($_[0]) {
      print
         if /^Table /
         && /, failed (\d+)\.$/m
         && $1 > 0;
   }
}

答案 1 :(得分:1)

Python--这不是最有效的,但希望算法清晰,并且有效:

text = '''
Table TRKGRP1: New table control.
      TRKGRP1: 1000 tuples checked. Tuple checking still in progress...
      Completed tuple checking.
      SUMMARY: Tbl TRKGRP1: tuples checked 1297, passed 1297, failed 0.
Table TOLLTRKS: New table control.
      Completed tuple checking.
      SUMMARY: Tbl TOLLTRKS: tuples checked 3, passed 3, failed 1.
Table BRANDOPT: New table control.
      Completed tuple checking.
      SUMMARY: Tbl BRANDOPT: tuples checked 0, passed 0, failed 0.
Table C7UPTMR: New table control.
      Completed tuple checking.
      SUMMARY: Tbl C7UPTMR: tuples checked 4, passed 4, failed 3.
Table TOPSCOIN: New table control.
      Completed tuple checking.
      SUMMARY: Tbl TOPSCOIN: tuples checked 0, passed 0, failed 2.
'''
lines = text.split('\n')

或者,从文件

with open('input.txt') as f:
    lines = f.readlines()
f.close()

然后

f = open("output.txt", 'w')
buf = []
show = False
for line in lines:
    if line.startswith('Table'):
        if show:
            f.writelines(buf)
        buf = []
        show = True
    buf.append(line)
    if line.find('failed 0') >= 0:
        show = False
if show:
    f.writelines(buf)
f.close()