如何找到每5分钟间隔的最大值?

时间:2012-06-28 15:21:01

标签: perl

以下程序打印以下数据:

 Wed,Jun,13,10:37:34,2012,759,41,0,30,10,0,0,1
 Wed,Jun,13,10:38:34,2012,767,33,0,25,6,0,0,2
 Wed,Jun,13,10:39:34,2012,758,42,0,32,10,0,0,0
 Wed,Jun,13,10:40:35,2012,758,42,0,29,11,0,0,2
 Wed,Jun,13,10:41:35,2012,761,39,0,34,5,0,0,0
 Wed,Jun,13,10:42:35,2012,769,31,0,22,6,0,0,3
 Wed,Jun,13,10:43:35,2012,754,46,0,29,17,0,0,0

我需要输出每5分钟间隔的最大值(例如769)。理想情况下,这将是10:00:00 - 10:05:00等。时间是军事时间(24小时)。这样做的最佳方式是什么?请注意,我是Perl的初学者。以下是我的代码:

#!/usr/bin/perl

# This program displays the max thread count at 5 minute intervals and writes the lines to a CSV file.

use strict;
use warnings;
use diagnostics;

# Initialize functions
my @data;
my $line;
my @L1;
#my $outFivemin = "log_5min.csv";
#open (FiveMin, ">> $outFivemin");

# Open the error_log 
open(FH, "error_log");
@data = <FH>;

# Filter the results to MPMStats only
sub findLines {
    my @return = ();
    foreach $line (@data) {
        if ( ($line =~ /notice/) && ($line =~ /rdy/) ) {  
                $line =~ s/ /,/g;   
                my @L1 = split(/|notice|\[|,mpmstats:,|\t|rdy,|bsy,|rd,|wr,|ka,|log,|dns,|cls,/, $line);
                $line =~ s/|notice|\[|,mpmstats:,|\t|rdy,|bsy,|rd,|wr,|ka,|log,|dns,|cls,//g;                   
                push @return, join("", @L1);
        }
    }
    return @return;
}

# Initializers for my data
my($dayOfWeek1,$month1,$dayOfMonth1,$time,$year1,$rdy,$bsy,$rd,$wr,$ka,$log,$dns);
my($cls);

# Create a 2D array
my @L2 = &findLines;
foreach my $line (@L2){
    ($dayOfWeek1, $month1, $dayOfMonth1, $time, $year1, $rdy, $bsy, $rd, $wr, $ka, $log, $dns, $cls) = split(/,/, $line);
    print "$dayOfWeek1,$month1,$dayOfMonth1,$time,$year1,$rdy,$bsy,$rd,$wr,$ka,$log,$dns,$cls";
}

4 个答案:

答案 0 :(得分:4)

我建议您操纵每条记录中的日期/时间以提供五分钟的密钥,并为每个密钥保持最大值。

例如,如果记录开始Wed,Jun,13,10:37:34,2012,则相应的密钥为Jun 13 10:35 2012

通常这将是一个哈希,但由于可能需要按时间顺序输出,并且需要额外的工作和模块来提供可排序的日期/时间字符串,因此下面的程序使用一对数组。 / p>

该程序通过在时间(第四)字段上使用正则表达式代理s///来工作,该字段用时间之前的前两位数分钟替换分钟和秒:忽略秒,并将分钟向下舍入到是五的倍数。

如果数组为空或者我们位于不同的[$range, $value],则新的@maxima对会被推送到$range数组。否则,如果我们找到新的最大值,则更新最新对的$value元素。

请注意,此程序需要命令行上的日志文件名,并且默认为error_log为无。

use strict;
use warnings;

@ARGV = ('error_log') unless @ARGV;

my @maxima;

while (<>) {

  my @fields = /([^,\s]+)/g;
  next unless @fields;
  $fields[3] =~ s|(\d+):\d\d$|5*int($1/5)|e;

  my $range = join ' ', @fields[1..4];
  my $value = $fields[5];

  if (@maxima == 0 or $range ne $maxima[-1][0]) {
    push @maxima, [$range, $value];
  }
  else {
    $maxima[-1][1] = $value if $maxima[-1][1] < $value;
  }
}

for (@maxima) {
  printf "Maximum for five minutes starting %s is %d\n", @$_;
}

<强>输出

Maximum for five minutes starting Jun 13 10:35 2012 is 767
Maximum for five minutes starting Jun 13 10:40 2012 is 769

<强>更新

现在,据我所知,您希望每隔五分钟时间内包含字段6最大值的整个记录,我已经编写了此修订后的代码。

它也适用于@L2数组的内容,而不是从文件中读取。

我确信这可以更好地编码,以便从while循环中读取文件并直接从那里生成输出,但除非您向我们展示一些日志文件数据,否则我无法建议更好的替代方法

此程序将从您在自己的程序中填充@L2的位置继续。

my @L2 = findLines();

my @maxima;

for my $record (@L2) {

  my @fields = $record =~ /([^,\s]+)/g;
  next unless @fields;

  my @range = @fields[1..4];
  $range[2] =~ s|(\d+):\d\d$|5*int($1/5)|e;
  my $range = join ' ', @range;
  my $value = $fields[5];

  if (@maxima == 0 or $range ne $maxima[-1][0]) {
    push @maxima, [$range, $value, $record];
  }
  else {
    @{$maxima[-1]}[1,2] = ($value, $record) if $maxima[-1][1] < $value;
  }
}

print $_->[2] for @maxima;

<强>输出

 Wed,Jun,13,10:38:34,2012,767,33,0,25,6,0,0,2
 Wed,Jun,13,10:42:35,2012,769,31,0,22,6,0,0,3

答案 1 :(得分:3)

这些方面的东西应该可以解决问题......

#!/usr/bin/perl

use strict;
use warnings;
use 5.010;

# Somewhere to store the data
my %data;

# Process the input a line at a time
while (<DATA>) {
  # Split the input line on commas and colons.
  # Assign the bits we need to variables.
  my ($mon,$day,$hr,$min,$sec,$yr,$val) = (split /[,:]/)[1 .. 7];

  # Normalise the minute value to five-minute increments
  # i.e 37 becomes 35, 42 becomes 40
  $min = int($min / 5) * 5;

  # Create push the value onto an array that is stored in %data using
  # a key generated from the timestamp.
  # Note that we use the 5-min normalised value of the minute so that
  # all values from the same five minute period end up in the same array.
  push @{$data{"$yr-$mon-$day $hr:$min"}}, $val;
}

# For each key in the array (i.e. each five minute increment...
foreach (sort keys %data) {
  # ... sort the array numerically and grab the last element
  # (which will be the largest)
  my $max = (sort { $a <=> $b } @{$data{$_}})[-1];
  # Say something useful
  say "$_ - $max";
}

__DATA__
Wed,Jun,13,10:37:34,2012,759,41,0,30,10,0,0,1
Wed,Jun,13,10:38:34,2012,767,33,0,25,6,0,0,2
Wed,Jun,13,10:39:34,2012,758,42,0,32,10,0,0,0
Wed,Jun,13,10:40:35,2012,758,42,0,29,11,0,0,2
Wed,Jun,13,10:41:35,2012,761,39,0,34,5,0,0,0
Wed,Jun,13,10:42:35,2012,769,31,0,22,6,0,0,3
Wed,Jun,13,10:43:35,2012,754,46,0,29,17,0,0,0

答案 2 :(得分:-1)

哎呀,我错误地认为你的csv输出是被解析的数据文件。

忽略下面的答案。

这是一个打印出原始逗号分隔线的解决方案。最大值和时间也可用于打印。但我用结果创建了一个逗号分隔文件。 : - )

#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV_XS;

my %interval;
my $csv = Text::CSV_XS->new ({ binary => 1 }) or
     die "Cannot use CSV: ".Text::CSV_XS->error_diag ();

open my $fh, "<", "o33.txt" or die "o33.txt: $!";
while (my $row = $csv->getline ($fh)) {
    my ($time, $amt) = @$row[3,5];
    my ($hr, $min) = split /:/, $time;
    my $key = sprintf "%02d:%02d", $hr, int($min/5) * 5;

    if (exists $interval{$key}) {
        if ($interval{$key}{amt} < $amt) {
            $interval{$key}{amt} = $amt;
            $interval{$key}{data} = $row;
        }
    }
    else { # first time in this 5 minute interval
        $interval{$key}{amt} = $amt;
        $interval{$key}{data} = $row;
    }
}
$csv->eof or $csv->error_diag ();
close $fh or die $!;;


$csv->eol ("\r\n");
open $fh, ">", 'junk.csv' or die $!;

for my $time (sort keys %interval) {
    $csv->print($fh, $interval{$time}{data});
}

close $fh or die $!;

'junk.csv'的输出是:

Wed,Jun,13,10:38:34,2012,767,33,0,25,6,0,0,2
Wed,Jun,13,10:42:35,2012,769,31,0,22,6,0,0,3

答案 3 :(得分:-1)

这适用于(?),(没有测试),它从my @L2 = &findLines之后的循环开始。

my %interval;
my %month;
@month{qw/ jan feb mar apr may jun jul aug sep oct nov dec /} = '01' .. '12';

# Create a 2D array 
my @L2 = &findLines;
foreach my $line (@L2){ 
    #($dayOfWeek1, $month1, $dayOfMonth1, $time, $year1, $rdy, $bsy, $rd, $wr, $ka, $log, $dns, $cls) = split(/,/, $line); 
    #print "$dayOfWeek1,$month1,$dayOfMonth1,$time,$year1,$rdy,$bsy,$rd,$wr,$ka,$log,$dns,$cls"; 
    my ($dow, $mon, $day, $hr, $min, $sec, $yr, $amt) = split /[:,]/, $line, 9;
    my $key = sprintf "%4d-%02d-%02d %02d:%02d",
                $yr, $month{lc $mon}, $day, $hr, int($min / 5) * 5;

    if (exists $interval{$key}) {
        if ($interval{$key}{amt} < $amt) {
            $interval{$key}{amt} = $amt;
            $interval{$key}{data} = [split ",", $line];
        }
    }
    else { # first time in this 5 minute interval
        $interval{$key}{amt} = $amt;
        $interval{$key}{data} = [split ",", $line];
    }
} 

my $csv = Text::CSV_XS->new ({ binary => 1 }) or
     die "Cannot use CSV: ".Text::CSV_XS->error_diag ();

$csv->eol ("\r\n");
open my $fh, ">", 'junk.csv' or die $!;

for my $time (sort keys %interval) {
    $csv->print($fh, $interval{$time}{data});
}

close $fh or die $!;

我希望这能让您更接近解决问题的方法 更新:添加了第一个要拆分的字段,并将其从8个更改为9个部分。