我有一个包含此类内容的日志文件:
Mon Nov 19 11:00:01 2012
Host: myserver
accurev-ent inuse: 629
Mon Nov 19 12:00:01 2012
Host: myserver
accurev-ent inuse: 629
使用Perl,我已经想出了如何删除空行并将非空行放入数组中。现在我想尝试匹配当前月份,日期和年份。也就是说,我试图抓住所有包含May, 21
和2013
的行(此文件是每天运行的脚本的产品,每天运行24次。我不需要hh:mm:ss
数据。
我一直试图通过以下方式模拟匹配:
foreach $prod (@prod)
{
# Sun May 19 02:00:01 2013
if ($prod =~ ((/Sun May 19/) && $prod =~(/2013$/)) )
{
print "Howdy! \n"; # just using to indicate success
}
}
我可以通过模式匹配来完成此操作,还是应该尝试拆分它并查找数据匹配?顺便说一下,一旦找到匹配,我需要将包含 inuse 的行放入一个数组中,找到当天最大的数字。
答案 0 :(得分:4)
#!/usr/bin/env perl
use strict;
use warnings;
use POSIX qw(strftime);
# The active regex looks for today's date
# The commented out regex looks for dates in the current month
# If you provide a suitable timestamp (seconds since the epoch),
# you can generate the pattern for an arbitrary date by changing
# time (a function call) to $timestamp.
my $pattern = strftime("%B %d \\d+:\\d+:\\d+ %Y", localtime(time));
# my $pattern = strftime("%B \\d+ \\d+:\\d+:\\d+ %Y", localtime(time));
# print "$pattern\n";
my $regex = qr/$pattern/;
# my @prod = <>;
foreach my $prod (@prod)
{
# print "Check: $prod\n";
if ($prod =~ $regex)
{
print "$prod\n";
}
}
这使用strftime
(来自POSIX)在正确的位置创建当前月份和年份的正则表达式字符串,并处理日期和时间组件应该是的数字字符串。然后,它会使用qr//
创建带引号的正则表达式,并将其应用于@prod
数组中的每个条目。如果您愿意,可以使\d+
个匹配更加严格;是否值得这样做取决于无关匹配的成本。 (当前正则表达式的一个版本比它可能更宽松,承认五月的第99和第00,以及20130年5月等;它们都允许无效时间通过)。所有这些都可以通过调整正则表达式来解决,而不会对答案产生重大影响。
答案 1 :(得分:1)
快速而肮脏的正则表达式:
my @prod = ('Mon Nov 19 11:00:01 2012', 'accurev-ent inuse: 629');
foreach $prod (@prod)
{
# Sun May 19 02:00:01 2013
if ($prod =~ /^\w+ (\w+) (\d+) ..:..:.. (\d+)$/)
{
print "Hodwy: $3 $1 $2\n";
}
if ($prod =~ /inuse: (\d+)$/)
{
print "Yo: $1\n";
}
}
产量
Hodwy: 2012 Nov 19
Yo: 629
答案 2 :(得分:0)
你说你需要每天的总数。这是我的目标。我希望我添加的评论足够了。我已经使用了数组索引,但我很确定这可以通过正则表达式返回引用来完成,我没有太多运气。
想我会解决我的误读,为什么不。
open(FILE, "<stackoverflow.data");
my @prod = <FILE>;
close(FILE);
# Strip newlines.
s/\n// for @prod;
my $data; # Hash to store data.
for (my $i = 0; $i < $#prod; $i) {
my $date = $prod[$i]; # First line.
my $host = $prod[$i + 1]; # Second line.
my $inuse = parseInuse($prod[$i + 2]); # Third line.
$date =~ /^\w+ (\w+) (\d+) .+? (\d+)$/;
$date = "$1 $2 $3";
# Initialize inuse value for date.
if (!defined($data->{$date})) {
$data->{$date} = 0;
}
# Replace stored inuse value if current loop inuse is greater.
if ($inuse > $data->{$date}) {
$data->{$date} = $inuse;
}
print "Processing $i raw($prod[$i]) sep(date: $date, host: $host, inuse: $inuse) split($inuse)\n";
# Skip blank line;
$i += ($prod[$i + 3] =~ m/^\s*?$/) ? 4 : 3;
}
print "\nTotals:\n";
my $matchdate = 'May 19 2013'; # Set to undef to show all.
#$matchdate = undef;
foreach my $date (sort keys %{$data}) {
if (defined($matchdate) && $date ne $matchdate) {
next;
}
print "$date: $data->{$date}\n";
}
sub parseInuse
{
my $i = shift;
my @parts = split(': ', $i);
$i = @parts[1];
$i =~ s/\s\+//g;
return $i;
}
# Mon Nov 19 11:00:01 2012
# Host: myserver
# accurev-ent inuse: 629
#
# Mon Nov 19 12:00:01 2012
# Host: myserver
# accurev-ent inuse: 800
#
# Sun May 19 02:00:01 2013
# Host: myserver
# accurev-ent inuse: 629
#
# Sun May 19 02:00:01 2013
# Host: myserver
# accurev-ent inuse: 1000
答案 3 :(得分:0)
use strict;
use warnings;
use 5.012;
use DateTime::Format::Strptime;
use List::Util qw/max/;
local $/ = "\n\n";
my $parser = DateTime::Format::Strptime->new(
pattern => '%a %b %d %H:%M:%S %Y',
locale => 'en_US',
time_zone => 'America/Chicago',
);
my @records;
for my $record (<DATA>) {
my ($timestamp, $host, $inuse) = split ("\n", $record);
$host =~ s/Host: //;
$inuse =~ s/accurev-ent inuse: //;
push @records, { timestamp => $parser->parse_datetime($timestamp),
host => $host,
inuse => $inuse,
};
}
say max map {$_->{inuse}} grep {$_->{timestamp}->ymd() eq '2013-05-21' } @records;
__DATA__
Mon Nov 19 11:00:01 2012
Host: myserver
accurev-ent inuse: 629
Mon Nov 19 12:00:01 2012
Host: myserver
accurev-ent inuse: 629
Sun May 19 02:00:01 2013
Host: myserver
accurev-ent inuse: 629
Tue May 21 02:00:01 2013
Host: myserver
accurev-ent inuse: 1200
Tue May 21 02:00:01 2013
Host: myserver
accurev-ent inuse: 62
Tue May 21 02:00:01 2013
Host: myserver
accurev-ent inuse: 29
给出:
1200
通过更改grep中使用的测试,您可以相当简单地更改过滤器范围(例如,最长时间为上午8点到晚上10点,最长超过一周的时间等)。