以下程序打印以下数据:
Wed,Jun,13,10:37:34,2012,759,41,0,30,10,0,0,1
Wed,Jun,13,10:38:34,2012,767,33,0,25,6,0,0,2
Wed,Jun,13,10:39:34,2012,758,42,0,32,10,0,0,0
Wed,Jun,13,10:40:35,2012,758,42,0,29,11,0,0,2
Wed,Jun,13,10:41:35,2012,761,39,0,34,5,0,0,0
Wed,Jun,13,10:42:35,2012,769,31,0,22,6,0,0,3
Wed,Jun,13,10:43:35,2012,754,46,0,29,17,0,0,0
我需要输出每5分钟间隔的最大值(例如769)。理想情况下,这将是10:00:00 - 10:05:00等。时间是军事时间(24小时)。这样做的最佳方式是什么?请注意,我是Perl的初学者。以下是我的代码:
#!/usr/bin/perl
# This program displays the max thread count at 5 minute intervals and writes the lines to a CSV file.
use strict;
use warnings;
use diagnostics;
# Initialize functions
my @data;
my $line;
my @L1;
#my $outFivemin = "log_5min.csv";
#open (FiveMin, ">> $outFivemin");
# Open the error_log
open(FH, "error_log");
@data = <FH>;
# Filter the results to MPMStats only
sub findLines {
my @return = ();
foreach $line (@data) {
if ( ($line =~ /notice/) && ($line =~ /rdy/) ) {
$line =~ s/ /,/g;
my @L1 = split(/|notice|\[|,mpmstats:,|\t|rdy,|bsy,|rd,|wr,|ka,|log,|dns,|cls,/, $line);
$line =~ s/|notice|\[|,mpmstats:,|\t|rdy,|bsy,|rd,|wr,|ka,|log,|dns,|cls,//g;
push @return, join("", @L1);
}
}
return @return;
}
# Initializers for my data
my($dayOfWeek1,$month1,$dayOfMonth1,$time,$year1,$rdy,$bsy,$rd,$wr,$ka,$log,$dns);
my($cls);
# Create a 2D array
my @L2 = &findLines;
foreach my $line (@L2){
($dayOfWeek1, $month1, $dayOfMonth1, $time, $year1, $rdy, $bsy, $rd, $wr, $ka, $log, $dns, $cls) = split(/,/, $line);
print "$dayOfWeek1,$month1,$dayOfMonth1,$time,$year1,$rdy,$bsy,$rd,$wr,$ka,$log,$dns,$cls";
}
答案 0 :(得分:4)
我建议您操纵每条记录中的日期/时间以提供五分钟的密钥,并为每个密钥保持最大值。
例如,如果记录开始Wed,Jun,13,10:37:34,2012
,则相应的密钥为Jun 13 10:35 2012
。
通常这将是一个哈希,但由于可能需要按时间顺序输出,并且需要额外的工作和模块来提供可排序的日期/时间字符串,因此下面的程序使用一对数组。 / p>
该程序通过在时间(第四)字段上使用正则表达式代理s///
来工作,该字段用时间之前的前两位数分钟替换分钟和秒:忽略秒,并将分钟向下舍入到是五的倍数。
如果数组为空或者我们位于不同的[$range, $value]
,则新的@maxima
对会被推送到$range
数组。否则,如果我们找到新的最大值,则更新最新对的$value
元素。
请注意,此程序需要命令行上的日志文件名,并且默认为error_log
为无。
use strict;
use warnings;
@ARGV = ('error_log') unless @ARGV;
my @maxima;
while (<>) {
my @fields = /([^,\s]+)/g;
next unless @fields;
$fields[3] =~ s|(\d+):\d\d$|5*int($1/5)|e;
my $range = join ' ', @fields[1..4];
my $value = $fields[5];
if (@maxima == 0 or $range ne $maxima[-1][0]) {
push @maxima, [$range, $value];
}
else {
$maxima[-1][1] = $value if $maxima[-1][1] < $value;
}
}
for (@maxima) {
printf "Maximum for five minutes starting %s is %d\n", @$_;
}
<强>输出强>
Maximum for five minutes starting Jun 13 10:35 2012 is 767
Maximum for five minutes starting Jun 13 10:40 2012 is 769
<强>更新强>
现在,据我所知,您希望每隔五分钟时间内包含字段6最大值的整个记录,我已经编写了此修订后的代码。
它也适用于@L2
数组的内容,而不是从文件中读取。
我确信这可以更好地编码,以便从while
循环中读取文件并直接从那里生成输出,但除非您向我们展示一些日志文件数据,否则我无法建议更好的替代方法
此程序将从您在自己的程序中填充@L2
的位置继续。
my @L2 = findLines();
my @maxima;
for my $record (@L2) {
my @fields = $record =~ /([^,\s]+)/g;
next unless @fields;
my @range = @fields[1..4];
$range[2] =~ s|(\d+):\d\d$|5*int($1/5)|e;
my $range = join ' ', @range;
my $value = $fields[5];
if (@maxima == 0 or $range ne $maxima[-1][0]) {
push @maxima, [$range, $value, $record];
}
else {
@{$maxima[-1]}[1,2] = ($value, $record) if $maxima[-1][1] < $value;
}
}
print $_->[2] for @maxima;
<强>输出强>
Wed,Jun,13,10:38:34,2012,767,33,0,25,6,0,0,2
Wed,Jun,13,10:42:35,2012,769,31,0,22,6,0,0,3
答案 1 :(得分:3)
这些方面的东西应该可以解决问题......
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
# Somewhere to store the data
my %data;
# Process the input a line at a time
while (<DATA>) {
# Split the input line on commas and colons.
# Assign the bits we need to variables.
my ($mon,$day,$hr,$min,$sec,$yr,$val) = (split /[,:]/)[1 .. 7];
# Normalise the minute value to five-minute increments
# i.e 37 becomes 35, 42 becomes 40
$min = int($min / 5) * 5;
# Create push the value onto an array that is stored in %data using
# a key generated from the timestamp.
# Note that we use the 5-min normalised value of the minute so that
# all values from the same five minute period end up in the same array.
push @{$data{"$yr-$mon-$day $hr:$min"}}, $val;
}
# For each key in the array (i.e. each five minute increment...
foreach (sort keys %data) {
# ... sort the array numerically and grab the last element
# (which will be the largest)
my $max = (sort { $a <=> $b } @{$data{$_}})[-1];
# Say something useful
say "$_ - $max";
}
__DATA__
Wed,Jun,13,10:37:34,2012,759,41,0,30,10,0,0,1
Wed,Jun,13,10:38:34,2012,767,33,0,25,6,0,0,2
Wed,Jun,13,10:39:34,2012,758,42,0,32,10,0,0,0
Wed,Jun,13,10:40:35,2012,758,42,0,29,11,0,0,2
Wed,Jun,13,10:41:35,2012,761,39,0,34,5,0,0,0
Wed,Jun,13,10:42:35,2012,769,31,0,22,6,0,0,3
Wed,Jun,13,10:43:35,2012,754,46,0,29,17,0,0,0
答案 2 :(得分:-1)
哎呀,我错误地认为你的csv输出是被解析的数据文件。
忽略下面的答案。
这是一个打印出原始逗号分隔线的解决方案。最大值和时间也可用于打印。但我用结果创建了一个逗号分隔文件。 : - )
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV_XS;
my %interval;
my $csv = Text::CSV_XS->new ({ binary => 1 }) or
die "Cannot use CSV: ".Text::CSV_XS->error_diag ();
open my $fh, "<", "o33.txt" or die "o33.txt: $!";
while (my $row = $csv->getline ($fh)) {
my ($time, $amt) = @$row[3,5];
my ($hr, $min) = split /:/, $time;
my $key = sprintf "%02d:%02d", $hr, int($min/5) * 5;
if (exists $interval{$key}) {
if ($interval{$key}{amt} < $amt) {
$interval{$key}{amt} = $amt;
$interval{$key}{data} = $row;
}
}
else { # first time in this 5 minute interval
$interval{$key}{amt} = $amt;
$interval{$key}{data} = $row;
}
}
$csv->eof or $csv->error_diag ();
close $fh or die $!;;
$csv->eol ("\r\n");
open $fh, ">", 'junk.csv' or die $!;
for my $time (sort keys %interval) {
$csv->print($fh, $interval{$time}{data});
}
close $fh or die $!;
'junk.csv'的输出是:
Wed,Jun,13,10:38:34,2012,767,33,0,25,6,0,0,2
Wed,Jun,13,10:42:35,2012,769,31,0,22,6,0,0,3
答案 3 :(得分:-1)
这适用于(?),(没有测试),它从my @L2 = &findLines
之后的循环开始。
my %interval;
my %month;
@month{qw/ jan feb mar apr may jun jul aug sep oct nov dec /} = '01' .. '12';
# Create a 2D array
my @L2 = &findLines;
foreach my $line (@L2){
#($dayOfWeek1, $month1, $dayOfMonth1, $time, $year1, $rdy, $bsy, $rd, $wr, $ka, $log, $dns, $cls) = split(/,/, $line);
#print "$dayOfWeek1,$month1,$dayOfMonth1,$time,$year1,$rdy,$bsy,$rd,$wr,$ka,$log,$dns,$cls";
my ($dow, $mon, $day, $hr, $min, $sec, $yr, $amt) = split /[:,]/, $line, 9;
my $key = sprintf "%4d-%02d-%02d %02d:%02d",
$yr, $month{lc $mon}, $day, $hr, int($min / 5) * 5;
if (exists $interval{$key}) {
if ($interval{$key}{amt} < $amt) {
$interval{$key}{amt} = $amt;
$interval{$key}{data} = [split ",", $line];
}
}
else { # first time in this 5 minute interval
$interval{$key}{amt} = $amt;
$interval{$key}{data} = [split ",", $line];
}
}
my $csv = Text::CSV_XS->new ({ binary => 1 }) or
die "Cannot use CSV: ".Text::CSV_XS->error_diag ();
$csv->eol ("\r\n");
open my $fh, ">", 'junk.csv' or die $!;
for my $time (sort keys %interval) {
$csv->print($fh, $interval{$time}{data});
}
close $fh or die $!;
我希望这能让您更接近解决问题的方法 更新:添加了第一个要拆分的字段,并将其从8个更改为9个部分。