多行正则表达式perl

时间:2014-08-12 21:52:01

标签: regex perl parsing

我试图解析跨越多行的日志文件中的数据(如下所示)。

Archiver Started: Fri May 16 00:35:00 2014
Daily Archive for (Thu) May. 15, 2014 STATUS: Successful Fri May 16 00:37:43 2014
Daily Archive for (Thu) May. 15, 2014 STATUS: Successful Fri May 16 00:39:54 2014
Archiver Completed: Fri May 16 00:42:37 2014

我希望在第一行的Archiver Started:上拆分,并在最后一行的Archiver Completed:上拆分这些行之间的任何内容。所以我将留下以下内容:

Daily Archive for (Thu) May. 15, 2014 STATUS: Successful Fri May 16 00:37:43 2014
Daily Archive for (Thu) May. 15, 2014 STATUS: Successful Fri May 16 00:39:54 2014

有时,一天,一周或一个月可以有一个或多个条目。

这是否可以使用正则表达式?

2 个答案:

答案 0 :(得分:3)

使用Range Operator ..

触发器的返回值是一个序列号(从1开始),因此您只需要过滤掉1和具有字符串" E0"的结束号码。附加到它。

use strict;
use warnings;

while (<DATA>) {
    if (my $range = /Archiver Started/ .. /Archiver Completed/ ) {
        print if $range != 1 && $range !~ /E/;
    }
}

__DATA__
stuff
more stuff
Archiver Started: Fri May 16 00:35:00 2014
Daily Archive for (Thu) May. 15, 2014 STATUS: Successful Fri May 16 00:37:43 2014
Daily Archive for (Thu) May. 15, 2014 STATUS: Successful Fri May 16 00:39:54 2014
Archiver Completed: Fri May 16 00:42:37 2014
other stuff
ending stuff

输出:

Daily Archive for (Thu) May. 15, 2014 STATUS: Successful Fri May 16 00:37:43 2014
Daily Archive for (Thu) May. 15, 2014 STATUS: Successful Fri May 16 00:39:54 2014

答案 1 :(得分:1)

你可以使用下一个技巧:

my @result = ();
my $catch;
LINE:
for my $line ( @lines ) {
    if ( $line =~ m/^Archiver Started/i ) {
        $catch = 1;
        next LINE;
    } elsif ( $line =~ m/^Archiver Completed/i ) {
        $catch = 0;
        next LINE;
    }
    next LINE unless $catch;
    push @result, $line;
}