我有一个包含三个不同字段(Date
,Time
,Value
)的数据集,如下所示:
2014-01-24 23:32:01.874 45
2014-01-24 23:32:02.198 71
2014-01-24 23:32:02.302 94
2014-01-24 23:32:02.439 48
2014-01-24 23:32:02.574 82
2014-01-24 23:32:02.724 51
2014-01-24 23:32:02.913 15
2014-01-24 23:32:02.964 77
2014-01-24 23:32:02.989 49
2014-01-24 23:32:03.017 42
2014-01-24 23:32:03.025 1
2014-01-24 23:32:03.085 67
2014-01-24 23:32:03.136 53
2014-01-24 23:32:03.200 46
2014-01-24 23:32:03.240 72
2014-01-24 23:32:03.257 0
2014-01-24 23:32:03.296 36
如何编写Perl脚本来计算每5分钟的平均值?
期望的输出:
Time Average Value
23:30 20
23:35 35
23:40 15
答案 0 :(得分:4)
编写Perl代码以逐行读取数据。
维护一个计数器变量,以5分钟为增量计算时间,以及一个包含您正在读取的数据行的数组。
当数据行的时间超过当前计数器变量值时,使用数组中的数据计算平均值,递增计数器并清除阵列。将新行添加到已清除的数组中,然后控制该过程。
答案 1 :(得分:0)
我将数据更改为间隔为5分钟:
2014-01-24 23:12:01.874 45
2014-01-24 23:12:02.198 71
2014-01-24 23:22:02.302 94
2014-01-24 23:22:02.439 48
2014-01-24 23:22:02.574 82
2014-01-24 23:32:02.724 51
2014-01-24 23:32:02.913 15
2014-01-24 23:32:02.964 77
2014-01-24 23:42:02.989 49
2014-01-24 23:42:03.017 42
2014-01-24 23:42:03.025 1
2014-01-24 23:52:03.085 67
2014-01-24 23:52:03.136 53
2014-01-24 23:52:03.200 46
2014-01-24 23:52:03.240 72
2014-01-24 23:52:03.257 0
2014-01-24 23:52:03.296 36
剧本:
#!/usr/bin/perl -n
BEGIN { use Date::Parse; }
push @r, [ $&, str2time($1).'.'.$2, $3 ] if /(.*?)\.(\d+) (\d+)\s*$/;
END {
$ref = undef;
$avg = 0.0;
$cnt = 0;
for(sort { $a->[1] <=> $b->[1]} @r) {
unless ($ref) {
$sum = $_->[2];
$cnt = 1;
$ref = $_->[1];
} else {
if ($_->[1] - $ref > 60.0*5) {
printf "avg: %5.2f\n", $sum/$cnt;
$ref = $_->[1];
$sum = $_->[2];
$cnt = 1;
} else {
$sum += $_->[2];
$cnt++;
}
}
printf "%s", $_->[0];
}
printf "avg: %5.2f\n", $sum/$cnt;
}
输出:
2014-01-24 23:12:01.874 45
2014-01-24 23:12:02.198 71
avg: 58.00
2014-01-24 23:22:02.302 94
2014-01-24 23:22:02.439 48
2014-01-24 23:22:02.574 82
avg: 74.67
2014-01-24 23:32:02.724 51
2014-01-24 23:32:02.913 15
2014-01-24 23:32:02.964 77
avg: 47.67
2014-01-24 23:42:02.989 49
2014-01-24 23:42:03.017 42
2014-01-24 23:42:03.025 1
avg: 30.67
2014-01-24 23:52:03.085 67
2014-01-24 23:52:03.136 53
2014-01-24 23:52:03.200 46
2014-01-24 23:52:03.240 72
2014-01-24 23:52:03.257 0
2014-01-24 23:52:03.296 36
avg: 45.67
答案 2 :(得分:0)
您可以将时间戳标准化为五分钟,通过转换为秒数除以300,取整数并乘以300得分。
采用这种方法(并使用Perl的标准Time :: Piece模块进行解析和格式化)得出如下内容:
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use Time::Piece;
my $fmt = '%Y-%m-%d %H:%M:%S';
my %data;
while (<DATA>) {
chomp;
my ($date, $time, undef, $count) = split /[ \.]/;
my $dt = Time::Piece->strptime("$date $time", $fmt);
push @{$data{five_min($dt->epoch)}}, $count;
}
foreach my $period (sort keys %data) {
my $total;
$total += $_ for @{$data{$period}};
my $avg = sprintf '%.2f', $total / @{$data{$period}};
say localtime($period)->strftime($fmt), ' ', $avg;
}
sub five_min {
my $epoch = shift;
my $secs = 5 * 60;
return $secs * int($epoch / $secs);
}
__DATA__
2014-01-24 23:12:01.874 45
2014-01-24 23:12:02.198 71
2014-01-24 23:22:02.302 94
2014-01-24 23:22:02.439 48
2014-01-24 23:22:02.574 82
2014-01-24 23:32:02.724 51
2014-01-24 23:32:02.913 15
2014-01-24 23:32:02.964 77
2014-01-24 23:42:02.989 49
2014-01-24 23:42:03.017 42
2014-01-24 23:42:03.025 1
2014-01-24 23:52:03.085 67
2014-01-24 23:52:03.136 53
2014-01-24 23:52:03.200 46
2014-01-24 23:52:03.240 72
2014-01-24 23:52:03.257 0
2014-01-24 23:52:03.296 36
输出结果为:
$ ./avgtime
2014-01-24 23:10:00 58.00
2014-01-24 23:20:00 74.67
2014-01-24 23:30:00 47.67
2014-01-24 23:40:00 30.67
2014-01-24 23:50:00 45.67