我有这种格式的数据
a1 1901 4
a1 1902 5
a3 1902 6
a4 1902 7
a4 1903 8
a5 1903 9
我想计算第一列中每个实体的累积分数(第3列)。所以我试着制作一个哈希,我的代码看起来像这样:
use strict;
use warnings;
use Data::Dumper;
my $file = shift;
open (DATA, $file);
my %hash;
while ( my $line = <DATA> ) {
chomp $line;
my ($protein, $year, $score) = split /\s+/, $line;
push @{ $hash{$protein}{$year} }, $score;
}
print Dumper \%hash;
close DATA:
输出看起来像这样
$VAR1 = {
'a3' => {
'1902' => [
5
]
},
'a1' => {
'1902' => [
6
],
'1901' => [
4
]
},
'a4' => {
'1903' => [
8
],
'1902' => [
7
]
},
'a5' => {
'1903' => [
9
]
}
};
我现在想要访问第1列(a1,a2,a3)中的每个实体并添加分数,因此所需的输出将是这样的:
a1 1901 4
a1 1902 9 # 4+5
a3 1902 6
a4 1902 7
a4 1903 16 # 7+9
a5 1903 9
但我无法想出如何在循环中访问创建的哈希值以添加值?
答案 0 :(得分:0)
我认为
a4 1903 16 # Sum of a4 1902 and a5 1903
应该是
a4 1903 15 # Sum of a4 1902 and a4 1903
如果是的话,
my %scores_by_protein_and_year;
while (<DATA>) {
my ($protein, $year, $score) = split;
$scores_by_protein_and_year{$protein}{$year} = $score;
}
for my $protein (keys(%scores_by_protein_and_year)) {
my $scores_by_year = $scores_by_protein_and_year{$protein};
my $score = 0;
for my $year (sort { $a <=> $b } keys(%$scores_by_year)) {
$score += $scores_by_year->{$year};
say "$protein $year $score";
}
}
即使数据没有分组/排序,这也有效。
答案 1 :(得分:0)
如果数据总是在您显示时进行排序,那么您可以在从文件中读取数据时处理数据:
while ( <DATA> ) {
my ($protein, $year, $score) = split;
$total = 0 unless $protein eq $current;
$total += $score;
print "$protein $year $total\n";
$current = $protein;
}
a1 1901 4
a1 1902 9
a3 1902 6
a4 1902 7
a4 1903 15
a5 1903 9