所以基本上我的问题可以用伪代码编写如下:
split the line by =
using value before =, find the next line
check this the value after = matches previous
if not, then loop till end of file
collect all the values which match and using the line numbers, get the last 2 columns value
sum all the values for a given set with equal key=value pair.
我的数据集如下:
3=5002, 0=10002, 5=1, 4=1, 7=1, 8=1, 9=0, 1=14002, 6=5, 200, 100
3=5002, 0=10002, 5=0, 4=1, 7=0, 8=0, 9=1, 1=14002, 6=5, 300, 10
3=5001, 0=10001, 5=0, 4=0, 7=0, 8=0, 9=0, 1=14001, 6=3, 1000, 80
3=5001, 0=10004, 5=1, 4=1, 7=2, 8=2, 9=1, 1=14001, 6=3, 10000, 1200
3=5003, 0=10004, 5=2, 4=0, 7=2, 8=2, 9=1, 1=14003, 6=8, 5000, 500
3=5003, 0=10004, 5=3, 4=1, 7=2, 8=1, 9=0, 1=14003, 6=8, 1000, 7
我需要做的是,取3的所有值,它们相等,得到最后2列的总和,并将其与该值相加。例如:
3 = 5002, sum = 500, 110
5 = 0, sum = 1300, 90
8 = 2, sum = 15000, 1700
我已经能够解析前3个,但我无法为其余列做任何事情: - (
答案 0 :(得分:3)
根据我的理解,这里有两种可能的方法。第一个使用复合键在单级哈希中存储值。第二个使用多级哈希:
方法1:
#!/usr/bin/perl
use strict;
use warnings;
use List::Util qw( sum );
my %data;
while ( my $line = <DATA> ) {
chomp $line;
my @parts = split /, /, $line;
last unless @parts;
my $value = pop @parts;
push @{ $data{$_} }, $value for @parts;
}
for my $col ( sort keys %data ) {
printf("%12s:%9d\n", $col, sum @{ $data{$col} } );
}
__DATA__
3=5002, 0=10002, 5=1, 4=1, 7=1, 8=1, 9=0, 1=14002, 6=5, 200
3=5002, 0=10002, 5=0, 4=1, 7=0, 8=0, 9=1, 1=14002, 6=5, 300
3=5001, 0=10001, 5=0, 4=0, 7=0, 8=0, 9=0, 1=14001, 6=3, 1000
3=5001, 0=10004, 5=1, 4=1, 7=2, 8=2, 9=1, 1=14001, 6=3, 10000
3=5003, 0=10004, 5=2, 4=0, 7=2, 8=2, 9=1, 1=14003, 6=8, 5000
3=5003, 0=10004, 5=3, 4=1, 7=2, 8=1, 9=0, 1=14003, 6=8, 1000
C:\Temp> hj
3=5001: 11000
3=5002: 500
3=5003: 6000
0=10001: 1000
0=10002: 500
0=10004: 16000
1=14001: 11000
1=14002: 500
1=14003: 6000
4=0: 6000
4=1: 11500
5=0: 1300
5=1: 10200
5=2: 5000
5=3: 1000
6=3: 11000
6=5: 500
6=8: 6000
7=0: 1300
7=1: 200
7=2: 16000
8=0: 1300
8=1: 1200
8=2: 15000
9=0: 2200
9=1: 15300
方法:2
#!/usr/bin/perl
use strict;
use warnings;
use List::Util qw( sum );
my %data;
while ( my $line = <DATA> ) {
chomp $line;
my @parts = split /, /, $line;
last unless @parts;
my $value = $parts[-1];
for ( my $i = 0 ; $i < @parts - 2; ++$i ) {
my @subparts = split /=/, $parts[$i];
push @{ $data{$subparts[0]}->{$subparts[1]} }, $value;
}
}
for my $k1 ( keys %data ) {
for my $k2 ( keys %{ $data{$k1} } ) {
printf(
"%2d:%6d:%9d \n",
$k1, $k2, sum @{ $data{$k1}->{$k2} }
);
}
}
__DATA__
3=5002, 0=10002, 5=1, 4=1, 7=1, 8=1, 9=0, 1=14002, 6=5, 200
3=5002, 0=10002, 5=0, 4=1, 7=0, 8=0, 9=1, 1=14002, 6=5, 300
3=5001, 0=10001, 5=0, 4=0, 7=0, 8=0, 9=0, 1=14001, 6=3, 1000
3=5001, 0=10004, 5=1, 4=1, 7=2, 8=2, 9=1, 1=14001, 6=3, 10000
3=5003, 0=10004, 5=2, 4=0, 7=2, 8=2, 9=1, 1=14003, 6=8, 5000
3=5003, 0=10004, 5=3, 4=1, 7=2, 8=1, 9=0, 1=14003, 6=8, 1000
C:\Temp> hjk
3: 5003: 6000
3: 5002: 500
3: 5001: 11000
7: 1: 200
7: 0: 1300
7: 2: 16000
9: 1: 15300
9: 0: 2200
8: 1: 1200
8: 0: 1300
8: 2: 15000
4: 1: 11500
4: 0: 6000
1: 14001: 11000
1: 14003: 6000
1: 14002: 500
0: 10001: 1000
0: 10004: 16000
0: 10002: 500
5: 1: 10200
5: 3: 1000
5: 0: 1300
5: 2: 5000
NB:添加sort
品尝。
答案 1 :(得分:1)
如何拆分“,”。然后,您可以拉出最后一个元素并将其与列表中的每个元素配对。对于你的第一行,你最终会得到以下几对:
3=5002, 200
0=10002, 200
5=1, 200
4=1, 200
7=1, 200
8=1, 200
9=0, 200
1=14002, 200
6=5, 200
将这些对中的每一对添加到主列表中。一旦你得到它,你可以按对中的第一个元素进行排序并求和。
答案 2 :(得分:0)
您解释问题的方式不是很清楚。根据我的理解,这将是我的方法:
创建一个二维数组,其中包含不同的逗号分隔字段,用于维护行,列结构。
分析每一列并创建一个哈希,将每个数据值映射到包含它的行。
IE:对于第一列,你有一个哈希值
3 = 5002 0,1
3 = 5001 2,3
3 = 5003 4,5
然后,您浏览哈希的每个条目,并将为不同数据列出的行的最后一个成员求和。
对除最后一列之外的每一列重复。
答案 3 :(得分:0)
我希望这就是你要找的东西:
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV_XS;
my %data;
my $csv = Text::CSV_XS->new();
while ( <DATA> ) {
$csv->parse($_);
my @fields = $csv->fields();
$fields[0] =~ s/^3=//;
$data{ $fields[0] } += $fields[9];
}
use Data::Dumper;
print Dumper \%data;
__DATA__
3=5002, 0=10002, 5=1, 4=1, 7=1, 8=1, 9=0, 1=14002, 6=5, 200
3=5002, 0=10002, 5=0, 4=1, 7=0, 8=0, 9=1, 1=14002, 6=5, 300
3=5001, 0=10001, 5=0, 4=0, 7=0, 8=0, 9=0, 1=14001, 6=3, 1000
3=5001, 0=10004, 5=1, 4=1, 7=2, 8=2, 9=1, 1=14001, 6=3, 10000
3=5003, 0=10004, 5=2, 4=0, 7=2, 8=2, 9=1, 1=14003, 6=8, 5000
3=5003, 0=10004, 5=3, 4=1, 7=2, 8=1, 9=0, 1=14003, 6=8, 1000
答案 4 :(得分:0)
或类似的东西。
所以,我的问题是:你能为示例数据集提供预期的输出吗?
无论如何,这是我的尝试('#/'评论只是为了帮助语法高亮显示。)
#!/usr/bin/perl
use strict;
use warnings;
my %h;
my @ord_keys;
while (<DATA>) {
chomp;
my @cols = split /,\s*/; #/
my $val = pop @cols;
foreach my $k (@cols) {
if (exists($h{$k})) {
$h{$k} += $val;
} else {
push @ord_keys, $k;
$h{$k} = $val;
}
}
}
foreach my $key (@ord_keys) {
my ($k, $v) = split /=/, $key; #/
print "$k = $v, sum = $h{$key}\n";
}
__DATA__
3=5002, 0=10002, 5=1, 4=1, 7=1, 8=1, 9=0, 1=14002, 6=5, 200
3=5002, 0=10002, 5=0, 4=1, 7=0, 8=0, 9=1, 1=14002, 6=5, 300
3=5001, 0=10001, 5=0, 4=0, 7=0, 8=0, 9=0, 1=14001, 6=3, 1000
3=5001, 0=10004, 5=1, 4=1, 7=2, 8=2, 9=1, 1=14001, 6=3, 10000
3=5003, 0=10004, 5=2, 4=0, 7=2, 8=2, 9=1, 1=14003, 6=8, 5000
3=5003, 0=10004, 5=3, 4=1, 7=2, 8=1, 9=0, 1=14003, 6=8, 1000
结果:
3 = 5002, sum = 500
0 = 10002, sum = 500
5 = 1, sum = 10200
4 = 1, sum = 11500
7 = 1, sum = 200
8 = 1, sum = 1200
9 = 0, sum = 2200
1 = 14002, sum = 500
6 = 5, sum = 500
5 = 0, sum = 1300
7 = 0, sum = 1300
8 = 0, sum = 1300
9 = 1, sum = 15300
3 = 5001, sum = 11000
0 = 10001, sum = 1000
4 = 0, sum = 6000
1 = 14001, sum = 11000
6 = 3, sum = 11000
0 = 10004, sum = 16000
7 = 2, sum = 16000
8 = 2, sum = 15000
3 = 5003, sum = 6000
5 = 2, sum = 5000
1 = 14003, sum = 6000
6 = 8, sum = 6000
5 = 3, sum = 1000
欢迎评论。