Perl:我可以使用模式匹配来查找日志文件中的某些行

时间:2013-05-21 16:11:52

标签: arrays perl pattern-matching

我有一个包含此类内容的日志文件:

Mon Nov 19 11:00:01 2012
Host: myserver
accurev-ent inuse: 629


Mon Nov 19 12:00:01 2012
Host: myserver
accurev-ent inuse: 629

使用Perl,我已经想出了如何删除空行并将非空行放入数组中。现在我想尝试匹配当前月份,日期和年份。也就是说,我试图抓住所有包含May, 212013的行(此文件是每天运行的脚本的产品,每天运行24次。我不需要hh:mm:ss数据。

我一直试图通过以下方式模拟匹配:

foreach $prod (@prod)
{
  # Sun May 19 02:00:01 2013
  if ($prod =~ ((/Sun May 19/) && $prod =~(/2013$/)) )
  {
    print "Howdy! \n"; # just using to indicate success
  }
}  

我可以通过模式匹配来完成此操作,还是应该尝试拆分它并查找数据匹配?顺便说一下,一旦找到匹配,我需要将包含 inuse 的行放入一个数组中,找到当天最大的数字。

4 个答案:

答案 0 :(得分:4)

#!/usr/bin/env perl
use strict;
use warnings;
use POSIX qw(strftime);

# The active regex looks for today's date
# The commented out regex looks for dates in the current month
# If you provide a suitable timestamp (seconds since the epoch),
# you can generate the pattern for an arbitrary date by changing
# time (a function call) to $timestamp.
my $pattern = strftime("%B %d \\d+:\\d+:\\d+ %Y", localtime(time));
# my $pattern = strftime("%B \\d+ \\d+:\\d+:\\d+ %Y", localtime(time));
# print "$pattern\n";
my $regex = qr/$pattern/;

# my @prod = <>;

foreach my $prod (@prod)
{
    # print "Check: $prod\n";
    if ($prod =~ $regex)
    {
        print "$prod\n";
    }
}

这使用strftime(来自POSIX)在正确的位置创建当前月份和年份的正则表达式字符串,并处理日期和时间组件应该是的数字字符串。然后,它会使用qr//创建带引号的正则表达式,并将其应用于@prod数组中的每个条目。如果您愿意,可以使\d+个匹配更加严格;是否值得这样做取决于无关匹配的成本。 (当前正则表达式的一个版本比它可能更宽松,承认五月的第99和第00,以及20130年5月等;它们都允许无效时间通过)。所有这些都可以通过调整正则表达式来解决,而不会对答案产生重大影响。

答案 1 :(得分:1)

快速而肮脏的正则表达式:

my @prod = ('Mon Nov 19 11:00:01 2012', 'accurev-ent inuse: 629');
foreach $prod (@prod)
{
  # Sun May 19 02:00:01 2013
  if ($prod =~ /^\w+ (\w+) (\d+) ..:..:.. (\d+)$/)
  {
    print "Hodwy: $3 $1 $2\n";
  }

  if ($prod =~ /inuse: (\d+)$/)
  {
    print "Yo: $1\n";
  }
}  

产量

Hodwy: 2012 Nov 19
Yo: 629

答案 2 :(得分:0)

你说你需要每天的总数。这是我的目标。我希望我添加的评论足够了。我已经使用了数组索引,但我很确定这可以通过正则表达式返回引用来完成,我没有太多运气。

想我会解决我的误读,为什么不。

open(FILE, "<stackoverflow.data");
my @prod = <FILE>;
close(FILE);

# Strip newlines.
s/\n// for @prod;

my $data; # Hash to store data.


for (my $i = 0; $i < $#prod; $i) {
    my $date  = $prod[$i];                 # First line.
    my $host  = $prod[$i + 1];             # Second line.
    my $inuse = parseInuse($prod[$i + 2]); # Third line.

    $date =~ /^\w+ (\w+) (\d+) .+? (\d+)$/;
    $date = "$1 $2 $3";

    # Initialize inuse value for date.
    if (!defined($data->{$date})) {
        $data->{$date} = 0;
    }

    # Replace stored inuse value if current loop inuse is greater.
    if ($inuse > $data->{$date}) {
        $data->{$date} = $inuse;
    }

    print "Processing $i raw($prod[$i]) sep(date: $date, host: $host, inuse: $inuse) split($inuse)\n";

    # Skip blank line;
    $i += ($prod[$i + 3] =~ m/^\s*?$/) ? 4 : 3;
}

print "\nTotals:\n";
my $matchdate = 'May 19 2013'; # Set to undef to show all.
#$matchdate = undef;

foreach my $date (sort keys %{$data}) {
    if (defined($matchdate) && $date ne $matchdate) {
        next;
    }
    print "$date: $data->{$date}\n";
}


sub parseInuse
{
    my $i = shift;

    my @parts = split(': ', $i);
    $i = @parts[1];
    $i =~ s/\s\+//g;

    return $i;
}



# Mon Nov 19 11:00:01 2012
# Host: myserver
# accurev-ent inuse: 629
# 
# Mon Nov 19 12:00:01 2012
# Host: myserver
# accurev-ent inuse: 800
# 
# Sun May 19 02:00:01 2013
# Host: myserver
# accurev-ent inuse: 629
# 
# Sun May 19 02:00:01 2013
# Host: myserver
# accurev-ent inuse: 1000

答案 3 :(得分:0)

use strict;
use warnings;
use 5.012;

use DateTime::Format::Strptime;
use List::Util qw/max/;

local $/ = "\n\n";
my $parser = DateTime::Format::Strptime->new(
    pattern   => '%a %b %d %H:%M:%S %Y',
    locale    => 'en_US',
    time_zone => 'America/Chicago',
); 
my @records;
for my $record (<DATA>) {
  my ($timestamp, $host, $inuse) = split ("\n", $record);
  $host =~ s/Host: //;
  $inuse =~ s/accurev-ent inuse: //;
  push @records, { timestamp => $parser->parse_datetime($timestamp), 
                   host => $host,
                   inuse => $inuse,
                 };
}

say max map {$_->{inuse}} grep {$_->{timestamp}->ymd() eq '2013-05-21' } @records;

__DATA__
Mon Nov 19 11:00:01 2012
Host: myserver
accurev-ent inuse: 629

Mon Nov 19 12:00:01 2012
Host: myserver
accurev-ent inuse: 629

Sun May 19 02:00:01 2013
Host: myserver
accurev-ent inuse: 629

Tue May 21 02:00:01 2013
Host: myserver
accurev-ent inuse: 1200

Tue May 21 02:00:01 2013
Host: myserver
accurev-ent inuse: 62

Tue May 21 02:00:01 2013
Host: myserver
accurev-ent inuse: 29

给出:

1200

通过更改grep中使用的测试,您可以相当简单地更改过滤器范围(例如,最长时间为上午8点到晚上10点,最长超过一周的时间等)。