根据一列中的某些功能汇总表格

时间:2014-09-03 02:19:41

标签: perl

我尝试根据第1列中的相同功能总结下表:

infile中:

A       m
A       m
A       n
A       n
A       m
A       c
A       m
A       i
A       n
A       n
B       n
B       n
B       n
B       n
B       n
B       n
C       o
C       i
C       q

我编写了以下代码,但我不知道它为什么不报告最后一个功能

perl code.pl 1 2 infile

use warnings;
use strict;

my $col_feature         = $ARGV[0];
my $col_to_be_collapsed = $ARGV[1];

my $infile = $ARGV[2];

open( my $fh1, "<$infile" );

my $temp;
my $line_count = 0;
my %count      = ();
my @array      = ();

while ( my $line = <$fh1> ) {
    chomp($line);

    my @line            = split( "\t| ", $line );
    my $to_be_collapsed = $line[ $col_to_be_collapsed - 1 ];
    my $feature         = $line[ $col_feature - 1 ];

    if ( $line_count >= 1 && $temp ne '' ) {

        my @temp                 = split( "\t| ", $temp );
        my $to_be_collapsed_temp = $temp[ $col_to_be_collapsed - 1 ];
        my $feature_temp         = $temp[ $col_feature - 1 ];

        if ( $feature_temp eq $feature ) {
            push( @array, $to_be_collapsed );
        }

        else {
            map { $count{$_}++ } @array;
            print "$feature_temp:\t";
            print "$_:$count{$_}\t" foreach sort { $a cmp $b } keys %count;

            %count = ();
            @array = ();
            $temp  = $line;

            push( @array, $to_be_collapsed );
            print "\n";
        }
    }

    else {
        $temp = $line;
        push( @array, $to_be_collapsed );
    }
    $line_count++;
}

#print $temp,"\n";

输出:

A:      c:1     i:1     m:4     n:4
B:      n:6

但是在第一栏中没有关于C的任何报告!!

由于

1 个答案:

答案 0 :(得分:4)

在这种特殊情况下使用哈希会更容易,因为你只需要保留一个计数器。

#!/usr/bin/perl

use strict;
use warnings;
use autodie;

#open my $fh, '<', 'infile';       # Uncomment for live file.
my $fh = \*DATA;                  # For testing only.

my %counter;

while (<$fh>) {
    my ( $outerkey, $innerkey ) = split;
    $counter{$outerkey}{$innerkey}++;
}

for my $outerkey ( sort keys %counter ) {
    print "$outerkey:";
    print "\t$_:$counter{$outerkey}{$_}" for sort keys %{ $counter{$outerkey} };
    print "\n";
}

__DATA__
A       m
A       m
A       n
A       n
A       m
A       c
A       m
A       i
A       n
A       n
B       n
B       n
B       n
B       n
B       n
B       n
C       o
C       i
C       q

输出:

A:  c:1 i:1 m:4 n:4
B:  n:6
C:  i:1 o:1 q:1