perl计算csv文件中列的差异

时间:2010-11-22 17:15:54

标签: arrays perl hash diff

我有一个csv文件(非常大),格式如下。

key1,val1,val2,val3... ,valn
key2,val2,val5,val1....,valn
...
...
keyn,val7,val9,val11....,valn
key1,val2,val4,val8.....,valn
key2,val10,val12,val14..., valn
...
...
keyn,val2,val4,val8.....,valn
key1,val3,val5,val7... ,valn
key2,val0,val9,val3....,valn

key1 to keyn(及其值)在csv文件中重复多次。

值(val1,valn)是double(float)。

我想要打印的内容:

1)从文件的开头,对于每个键,我想计算列值(例如val2,val4,val6)与下一次出现的键之间的差异。

所以例如

key1,2,4,6
key2,3,5,7
...
...
key1,4,6,8
key2,4,6,8

我想打印

key1:来自先前记录的差异是key1,2,2,2 key2:来自先前记录的差异是key2,1,1,1 ..

keyn:以前记录的差异是...........

2)对每个连续出现的每个键重复执行此操作。

这就是我的目标(以哈希值存储值)

#!/usr/bin/perl

my %hash;
open my $fh, '<', 'file1.csv' or die "Cannot open: $!";
while (my $line = <$fh>) {
  $line =~ s/\s*\z//;
  my @array = split /,/, $line;
  my $key = shift @array;
  $hash{$key} = \@array;
}
close $fh;

2 个答案:

答案 0 :(得分:2)

您可以尝试:

    # get the key.
    my $key = shift @array;

    # see if the key is already seen.
    if(exists $hash{$key} ) {
            # get ref to previous record of this key.
            my $ref = $hash{$key};

            # print key.
            print "$key,";

            # a new array.
            my @new_array;

            # populate the new array.
            for(my $i=0;$i<=$#array;$i++) {
                    $new_array[$i] = $array[$i] - $$ref[$i];
            }

            # join the array elements with comma.
            print join",",@new_array;
            print "\n";
    }

    # add/replace the current array as value for the current key.
    $hash{$key} = \@array;

You can see the working code here

答案 1 :(得分:0)

我的尝试:

use strict;
use warnings;

use Text::CSV_XS;

use Math::Matrix;



my $csv = Text::CSV_XS->new({binary => 1});

my %hash;

my @results;

open my $fh, '<', 'file1.csv' or die "Cannot open: $!";

while (my $line = <$fh>) {

  if ($csv->parse($line)) {

    my @array = $csv->fields;
    my $key = shift @array;

    if (! exists $hash{$key}) {
      $hash{$key} = \@array;
      next;
    }



    my $previous_record = Math::Matrix->new($hash{$key});
    my $current_record = Math::Matrix->new(\@array);

    my $new_record = $previous_record->add($current_record->negative);

    push @results, @$new_record;

    $hash{$key} = \@array;



  }
  else {
    my $err = $csv->error_input;
    print "error parsing: $err\n";
  }

}