Perl在两个哈希引用上工作

时间:2011-09-13 10:26:27

标签: perl perl-data-structures

我想比较两个哈希引用的值。 我的第一个哈希的数据转储器是:

$VAR1 = {
          '42-MG-BA' => [
                          {
                            'chromosome' => '19',
                            'position' => '35770059',
                            'genotype' => 'TC'
                          },
                          {
                            'chromosome' => '2',
                            'position' => '68019584',
                            'genotype' => 'G'
                          },
                          {
                            'chromosome' => '16',
                            'position' => '9561557',
                            'genotype' => 'G'
                          },

第二个哈希与此类似,但在数组中有更多哈希值。如果位置和choromosome匹配,我想比较我的第一和第二个哈希的基因型。

map {print "$_= $cave_snp_list->{$_}->[0]->{chromosome}\n"}sort keys %$cave_snp_list;
map {print "$_= $geno_seq_list->{$_}->[0]->{chromosome}\n"}sort keys %$geno_seq_list;

我可以为第一个哈希数组做到这一点。 你能帮我解决一下如何为所有阵列工作吗?

这是我的实际代码

#!/software/bin/perl

use strict;

use warnings;
use Getopt::Long;
use Benchmark;
use Config::Config qw(Sequenom.ini);
useDatabase::Conn;
use Data::Dumper;

GetOptions("sam=s" => \my $sample);

my $geno_seq_list = getseqgenotypes($sample);
my $cave_snp_list = getcavemansnpfile($sample);
#print Dumper($geno_seq_list);
print scalar %$geno_seq_list, "\n";

foreach my $sam (keys %{$geno_seq_list}) {

    my $seq_used  = $geno_seq_list->{$sam};
    my $cave_used = $cave_snp_list->{$sam};
    print scalar(@$geno_seq_list->{$_}) if sort keys %$geno_seq_list, "\n";
    print scalar(@$cave_used), "\n";
    #foreach my $seq2com (@ {$seq_used } ){
    #    foreach my $cave2com( @ {$cave_used} ){
    #       print $seq2com->{chromosome},":" ,$cave2com->{chromosome},"\n";
    #    }
    #}

    map { print "$_= $cave_snp_list->{$_}->[0]->{chromosome}\n" } sort keys %$cave_snp_list;
    map { print "$_= $geno_seq_list->{$_}->[0]->{chromosome}\n" } sort keys %$geno_seq_list;
}

sub getseqgenotypes {

    my $snpconn;
    my $gen_list = {};
    $snpconn = Database::Conn->new('live');
    $snpconn->addConnection(DBI->connect('dbi:Oracle:pssd.world', 'sn', 'ss', { RaiseError => 1, AutoCommit => 0 }),
        'pssd');

#my $conn2 =Database::Conn->new('live');
#$conn2->addConnection(DBI->connect('dbi:Oracle:COSI.world','nst_owner','nst_owner', {RaiseError =>1 , AutoCommit=>0}),'nst');
    my $id_ind = $snpconn->execute('snp::Sequenom::getIdIndforExomeSample', $sample);
    my $genotype = $snpconn->executeArrRef('snp::Sequenom::getGenotypeCallsPosition', $id_ind);
    foreach my $geno (@{$genotype}) {

        push @{ $gen_list->{ $geno->[1] } }, {

            chromosome => $geno->[2],
            position   => $geno->[3],
            genotype   => $geno->[4],
        };

    }

    return ($gen_list);
}    #end of sub getseqgenotypes

sub getcavemansnpfile {

    my $nstconn;
    my $caveman_list = {};
    $nstconn = Database::Conn->new('live');
    $nstconn->addConnection(
        DBI->connect('dbi:Oracle:CANP.world', 'nst_owner', 'NST_OWNER', { RaiseError => 1, AutoCommit => 0 }), 'nst');

    my $id_sample = $nstconn->execute('nst::Caveman::getSampleid', $sample);
    #print "IDSample: $id_sample\n";
    my $file_location = $nstconn->execute('nst::Caveman::getCaveManSNPSFile', $id_sample);

    open(SNPFILE, "<$file_location") || die "Error: Cannot open the file $file_location:$!\n";

    while (<SNPFILE>) {

        chomp;
        next if /^>/;
        my @data = split;
        my ($nor_geno, $tumor_geno) = split /\//, $data[5];
        # array of hash
        push @{ $caveman_list->{$sample} }, {

            chromosome => $data[0],
            position   => $data[1],
            genotype   => $nor_geno,

        };

    }    #end of while loop
    close(SNPFILE);
    return ($caveman_list);
}

1 个答案:

答案 0 :(得分:0)

我看到的问题是,当你想要的是一个特定于任务的图形时,你正在构建一个用于数据通用存储的树。在构建记录时,您还可以构建将数据组合在一起的部分。以下只是一个例子。

my %genotype_for;
my $record
    = { chromosome => $data[0]
      , position   => $data[1]
      , genotype   => $nor_geno
    };
push @{ $gen_list->{ $geno->[1] } }, $record; 

# $genotype_for{ position }{ chromosome }{ name of array } = genotype code
$genotype_for{ $data[1] }{ $data[0] }{ $sample } = $nor_geno;

...
return ( $caveman_list, \%genotype_for );

在主线中,您会收到它们:

my ( $cave_snp_list, $geno_lookup ) = getcavemansnpfile( $sample );

这种方法至少可以让您找到相似的位置和染色体值。如果你要对此做很多事情,我可能会建议采用OO方法。


<强>更新

假设您不必存储标签,我们可以将查找更改为

$genotype_for{ $data[1] }{ $data[0] } = $nor_geno;

然后可以写出比较:

foreach my $pos ( keys %$small_lookup ) { 
    next unless _HASH( my $sh = $small_lookup->{ $pos } )
            and _HASH( my $lh = $large_lookup->{ $pos } )
            ;
    foreach my $chrom ( keys %$sh ) { 
        next unless my $sc = $sh->{ $chrom }
               and  my $lc = $lh->{ $chrom }
               ;
        print "$sc:$sc";
    }
}

但是,如果您对较大列表的使用有限,则可以构建特定案例 并在创建较长列表时将其作为过滤器传递。

因此,无论哪个循环创建更长的列表,您都可以去

...
next unless $sample{ $position }{ $chromosome };
my $record
    = { chromosome => $chromosome
      , position   => $position
      , genotype   => $genotype
    };
...