Question

我的输入数据如下。从下面的数据我想要p1 p2 .. p5和第一列的唯一性，并获得这些数据。

ID  M   N 
cc1 1   p1
cc1 10  p2
cc1 10  p2
cc2 1   p1
cc2 2   p5
cc3 2   p1
cc3 2   p4

我预计结果是

ID  M   p1  p2  p3  p4  p5 
cc1 3   1   2   0   0   0   
cc3 2   1   0   0   1   0   
cc2 2   1   0   0   0   1

为此，我尝试了hash of hash和hash我得到了我期望的输出。但我怀疑是可以通过使用单个哈希来实现这一点。因为相同的数据存储在两个不同的哈希值中。

my (%hash,$hash2);
<$fh>;
while (<$fh>)
{
    my($first,$second,$thrid) = split("\t");
    $hash{$first}{$thrid}++; #I tried $hash{$first}++{$thrid}++ It throws syntax error
    $hash2{$first}++; #it is possible to reduce this hash
}
my @ar = qw(p1  p2  p3  p4  p5);
$, = "\t"; 
print @ar,"\n";
foreach (keys %hash)
{
    print "$_\t$hash2{$_}\t";
    foreach my $ary(@ar)
    {
        if(!$hash{$_}{$ary})
        {
            print "0\t"; 
        }
        else
        {
            print "$hash{$_}{$ary}\t";
        }
    }
    print "\n";
}

Answer 1

无需使用2个哈希值。你只能使用哈希哈希。我刚刚修改了你的代码。看下面的代码。

use strict;
use warnings;
my %hash;
<DATA>;
while (<DATA>)
{
    chomp;
    my($first,$second,$thrid) = split("\t");
    $hash{$first}{$thrid}++; #I tried $hash{$first}++{$thrid}++ It throws syntax error
}
my @ar = qw(p1  p2  p3  p4  p5);
$, = "\t"; 
print @ar,"\n";
foreach (keys %hash)
{
#    print "$_\t$hash2{$_}\t";
    my @in = values $hash{$_};
    my $cnt = eval(join("+",@in));
    print "$_\t$cnt\t";
    foreach my $ary(@ar)
    {
        if(!$hash{$_}{$ary})
        {
            print "0\t"; 
        }
        else
        {
            print "$hash{$_}{$ary}\t";
        }
    }
    print "\n";
}

您有哈希哈希来存储数据。第一个键是id，第二个键是N。只需计算id的值，就可以得出您想要的总值。

Answer 2

我可能会这样做：

#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;

my %count_of;

#read the header row 
chomp( my @header = split ' ', <DATA> );

while (<DATA>) {
   my ( $ID, $M, $N ) = split;
   $count_of{ $ID }{ $N }++;
}
#print Dumper \%count_of;

#setup the output headers. We could autodetect, but some of these (p3) are entirely empty. 
my @p_headers = qw ( p1 p2 p3 p4 p5 );
#if you did want to:
#my @p_headers = sort keys %{{map { $_ => 1 } map { keys %{$count_of{$_}} } keys %count_of }};
#will give p1 p2 p4 p5. 

print join "\t", qw ( ID M ), @p_headers, "\n";
foreach my $ID ( sort keys %count_of ) {
   my $total = 0;
   $total += $_ for values %{ $count_of{$ID} };
   print join "\t", 
                   $ID, 
                   $total,
                   ( map { $count_of{$ID}{$_} // 0 } @p_headers ),
                   "\n";
}

__DATA__
ID  M   N 
cc1 1   p1
cc1 10  p2
cc1 10  p2
cc2 1   p1
cc2 2   p5
cc3 2   p1
cc3 2   p4

是否可以使用单个哈希计算两列中的重复数量？

2 个答案: