动态构建HoHoA

时间:2013-09-11 12:10:02

标签: arrays perl data-structures hash

我正在尝试将一大堆数据组织成一个Hash of Arhes of Arrays。当我手动声明值等时,以下工作正常:

#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
$Data::Dumper::Sortkeys = 1;


my %experiment = (
    'gene1' =>  {
                       'condition1' => ['XLOC_000157', '90', '0.001'],
                       'condition2' => ['XLOC_000347','80', '0.5'],
                       'condition3' => ['XLOC_000100', '50', '0.2']
                   },
    'gene2'   =>  {
                       'condition1' => ['XLOC_025437', '100', '0.018'],
                       'condition2' => ['XLOC_000322', '77', '0.22'],
                       'condition3' => ['XLOC_001000', '43', '0.002']

                   }
);

然后打印出键/值:

for my $gene (sort keys %experiment) {
    for my $condition ( sort keys %{$experiment{$gene}} ) {
        print "$gene\t$condition\t";
            for my $values (@{$experiment{$gene}{$condition}} ) {
                print "[$values]\t";
            }
        print "\n";
     }
}

输出:

gene1   condition1  [XLOC_000157]   [90]    [0.001] 
gene1   condition2  [XLOC_000347]   [80]    [0.5]   
gene1   condition3  [XLOC_000100]   [50]    [0.2]   
gene2   condition1  [XLOC_025437]   [100]   [0.018] 
gene2   condition2  [XLOC_000322]   [77]    [0.22]  
gene2   condition3  [XLOC_001000]   [43]    [0.002] 

但是,我正在处理的实际数据太大而无法手动声明,所以我希望能够获得与上面相同的结果,但是从包含每个字段的数组开始,例如:

示例输入:

condition1    XLOC_000157    1.04564    0.999592      99.66   gene1
condition1    XLOC_000159    0.890436    0.999592    99.47   gene2
condition2    XLOC_000561    -1.05905    0.999592      91.57   gene1
condition2    XLOC_00076    -0.755473    0.999592      99.04   gene2

将输入拆分为数组:

my (@gene, @condition, @percent_id, @Xloc, @change, @q_value @split, %experiment);
while (<$list>) {
    chomp;
    @split = split('\t');
    push @condition, $split[0];
    push @Xloc, $split[1];
    push @change, $split[2];
    push @q_value, $split[3];
    push @percent_id, $split[4];
    push @gene, $split[5];
}   

我一直在构建HoAs来存储它:

push @{$results{$gene_name[$_]} }, [ $Xloc[$_], $change, $q_value, $percent_id[$_] ] for 0 .. $#gene_name;

但我现在正试图整合每个HoA的'条件'信息,从而建立一个HoHoA。理想情况下,我希望以与上面类似的方式在while循环内(因此“动态”)执行此操作,以实现以下数据结构:

$VAR1 = {
          'gene1' => {
                       'condition1' => [
                                         'XLOC_000157',
                                         '1.04564',
                                         '0.999592',
                                         '99.66'
                                       ],
                       'condition2' => [
                                         'XLOC_000561',
                                         '-1.05905',
                                         '0.999592'
                                         '91.57'

                                       ],

                     },
          'gene2' => {
                       'condition1' => [
                                         'XLOC_000159',
                                         '0.890436',
                                         '0.999592'
                                         '99.47'

                                       ],
                       'condition2' => [
                                         'XLOC_00076',
                                         '-0.755473',
                                         '0.999592'
                                         '99.04'

                                       ],

                     }
        };

1 个答案:

答案 0 :(得分:1)

my %experiment;
while (<$list>) {
    chomp;
    my ($condition, $xloc, $percent_id, $gene) = split /\t/;
    $experiment{$gene}{$condition} = [ $xloc, $percent_id ];
}