Question

我有以下格式的文件。

DATE Time, v1,v2,v3
05:33:25,n1,n2,n3
05:34:25,n4,n5,n5
05:35:24,n6,n7,n8
and so on upto 05:42:25.

我想每隔5分钟计算一次值v1，v2和v3。我写了下面的示例代码。

while (<STDIN>) {
    my ($dateTime, $v1, $v2, $v3) = split /,/, $_;
    my ($date, $time) = split / /, $dateTime;
}

我可以读取所有值，但需要帮助以每5分钟间隔对所有值求和。任何人都可以建议我每隔5分钟添加时间和值的代码。

必需的输出

05:33 v1(sum 05:33 to 05:37) v2(sum 05:33 to 05:33) v3(sum 05:33 to 05:33)
05:38 v1(sum 05:38 to 05:42) v2(sum 05:38 to 05:42) v3(sum 05:38 to 05:42)
and so on..

Answer 1

代码是以下SinanÜnür的 ~~previous~~ 答案的变体，除了：

（1）函数timelocal将允许你在日，月，年中阅读 - 所以你可以总结任何五分钟的差距。

（2）应该处理最终时间差为＆lt; 5分钟。

#!/usr/bin/perl -w
use strict;
use warnings;
use Time::Local;
use POSIX qw(strftime);

my ( $start_time, $end_time, $current_time );
my ( $totV1,      $totV2,    $totV3 );          #totals in time bands

while (<DATA>) {
    my ( $hour, $min, $sec, $v1, $v2, $v3 ) =
      ( $_ =~ /(\d+)\:(\d+)\:(\d+)\,(\d+),(\d+),(\d+)/ );

    #convert time to epoch seconds
    $current_time =
      timelocal( $sec, $min, $hour, (localtime)[ 3, 4, 5 ] );    #sec,min,hr

    if ( !$end_time ) {
        $start_time = $current_time;
        $end_time   = $start_time + 5 * 60;    #plus 5 min
    }
    if ( $current_time <= $end_time ) {
        $totV1 += $v1;
        $totV2 += $v2;
        $totV3 += $v3;
    }
    else {
        print strftime( "%H:%M:%S", localtime($start_time) ),
          " $totV1,$totV2,$totV3\n";
        $start_time = $current_time;
        $end_time   = $start_time + 5 * 60;    #plus 5 min
        ( $totV1, $totV2, $totV3 ) = ( $v1, $v2, $v3 );
    }
}

#Print results of final loop (if required)
if ( $current_time <= $end_time ) {
    print strftime( "%H:%M:%S", localtime($start_time) ),
      " $totV1,$totV2,$totV3\n";
}

__DATA__
05:33:25,29,74,96
05:34:25,41,69,95
05:35:25,24,38,55
05:36:25,96,63,70
05:37:25,84,65,74
05:38:25,78,58,93
05:39:25,51,38,19
05:40:25,86,40,64
05:41:25,80,68,65
05:42:25,4,93,81

输出：

05:33:25 352,367,483
05:39:25 221,239,229

Answer 2

显然，由于缺乏样本数据，没有经过多少测试。要解析CSV，请使用Text::CSV_XS或Text::xSV而不是下面的幼稚split。

注意：

此代码不如果输入数据有间隙，请确保输出连续五分钟。
如果有多天的时间戳，您将遇到问题。事实上，如果时间戳不是24小时格式，即使数据来自一天，您也会遇到问题。

有了这些警告，它仍然应该给你一个起点。

#!/usr/bin/perl

use strict;
use warnings;

my $split_re = qr/ ?, ?/;
my @header = split $split_re, scalar <DATA>;
my @data;

my $time_block = 0;

while ( my $data = <DATA> ) {
    last unless $data =~ /\S/;
    chomp $data;
    my ($ts, @vals) = split $split_re, $data;

    my ($hr, $min, $sec) = split /:/, $ts;
    my $secs = 3600*$hr + 60*$min + $sec;

    if ( $secs > $time_block + 300 ) {
        $time_block = $secs;
        push @data, [ $time_block ];
    }

    for my $i (1 .. @vals) {
        $data[-1]->[$i] += $vals[$i - 1];
    }
}

print join(', ', @header);
for my $row ( @data ) {
    my $ts = shift @$row;
    print join(', ',
        sprintf('%02d:%02d', (localtime($ts))[2,1])
        , @$row
    ), "\n";
}


__DATA__
DATE Time, v1,v2,v3
05:33:25,1,3,5
05:34:25,2,4,6
05:35:24,7,8,9
05:55:24,7,8,9
05:57:24,7,8,9

输出：

DATE Time, v1, v2, v3
05:33, 10, 15, 20
05:55, 14, 16, 18

Answer 3

这是Perl要解决的一个很好的问题。最难的部分是从datetime字段中获取值并确定它属于哪个5分钟的存储桶。其余的只是哈希。

my (%v1,%v2,%v3);
while (<STDIN>) {
    my ($datetime,$v1,$v2,$v3) = split /,/, $_;
    my ($date,$time) = split / /, $datetime;
    my $bucket = &get_bucket_for($time);
    $v1{$bucket} += $v1;
    $v2{$bucket} += $v2;
    $v3{$bucket} += $v3;
}
foreach my $bucket (sort keys %v1) {
    print "$bucket $v1{$bucket} $v2{$bucket} $v3{$bucket}\n";
}

这是您实施&get_bucket_for的一种方式：

my $first_hhmm;
sub get_bucket_for {
    my ($time) = @_;
    my ($hh,$mm) = split /:/, $time;  # looks like seconds are not important

    # buckets are five minutes apart, but not necessarily at multiples of 5 min
    # (i.e., buckets could go 05:33,05:38,... instead of 05:30,05:35,...)
    # Use the value from the first time this function is called to decide
    # what the starting point of the buckets is.
    if (!defined $first_hhmm) {
        $first_hhmm = $hh * 60 + $mm;
    }

    my $bucket_index = int(($hh * 60 + $mm - $first_hhmm) / 5);
    my $bucket_start = $first_hhmm + 5 * $bucket_index;
    return sprintf "%02d:%02d", $bucket_start / 60, $bucket_start % 60;

}

Answer 4

我不确定为什么你会使用从第一次开始的时间，而不是每隔5分钟（00 - 05,05 - 10等），但这是一个快速而肮脏的方式来做到这一点方式：

my %output;
my $last_min = -10; # -10 + 5 is less than any positive int.
while (<STDIN>) {
    my ($dt, $v1, $v2, $v3) = split(/,/, $_);
    my ($h, $m, $s) = split(/:/, $dt);
    my $ts = $m + ($h * 60);
    if (($last_min + 5) < $ts) {
        $last_min = $ts;
    }
    $output{$last_min}{1} += $v1;
    $output{$last_min}{2} += $v2;
    $output{$last_min}{3} += $v3;
}
foreach my $ts (sort {$a <=> $b} keys %output) {
    my $hour = int($ts / 60);
    my $minute = $ts % 60;
    printf("%01d:%02d v1(%i) v2(%i) v3(%i)\n", (
            $hour,
            $minute,
            $output{$ts}{1},
            $output{$ts}{2},
            $output{$ts}{3},
        ));
}

不确定为什么你会这样做，但是在那里你进入程序Perl，例如。如果您需要更多printf格式go here。

如何在Perl中以五分钟的时间间隔汇总数据？

4 个答案: