如何在没有硬编码循环的情况下创建多个列表的组合?

时间:2009-09-18 07:06:36

标签: perl algorithm nested-loops

我的数据如下:

    my @homopol = (
                   ["T","C","CC","G"],  # part1
                   ["T","TT","C","G","A"], #part2
                   ["C","CCC","G"], #part3 ...upto part K=~50
                  );


    my @prob = ([1.00,0.63,0.002,1.00,0.83],
                [0.72,0.03,1.00, 0.85,1.00],
                [1.00,0.97,0.02]);


   # Note also that the dimension of @homopol is always exactly the same with @prob.
   # Although number of elements can differ from 'part' to 'part'.

我想做的是

  1. part1partK
  2. 中生成所有元素组合
  3. @prob
  4. 中查找相应元素的产品

    因此最后我们希望得到这个输出:

    T-T-C  1 x 0.72 x 1 = 0.720
    T-T-CCC     1 x 0.72 x 0.97 = 0.698
    T-T-G  1 x 0.72 x 0.02 = 0.014
    ...
    G-G-G  1 x 0.85 x 0.02 = 0.017
    G-A-C  1 x 1 x 1 = 1.000
    G-A-CCC     1 x 1 x 0.97 = 0.970
    G-A-G  1 x 1 x 0.02 = 0.020
    

    问题是我的以下代码通过硬编码来实现 循环。由于@homopol的部分数量可以变化和大 (例如~K = 50),我们需要一种灵活而紧凑的方法来获得相同的结果。有没有? 我在考虑使用Algorithm::Loops,但不知道如何实现这一目标。

    use strict;
    use Data::Dumper;
    use Carp;
    
    
    my @homopol = (["T","C","CC","G"],
                   ["T","TT","C","G","A"],
                   ["C","CCC","G"]);
    
    
    my @prob = ([1.00,0.63,0.002,1.00,0.83],
                [0.72,0.03,1.00, 0.85,1.00],
                [1.00,0.97,0.02]);
    
    
    
    my $i_of_part1 = -1;
    foreach my $base_part1 ( @{ $homopol[0] } ) {
        $i_of_part1++;
        my $probpart1 = $prob[0]->[$i_of_part1];
    
        my $i_of_part2 =-1;
        foreach my $base_part2 ( @{ $homopol[1] } ) {
            $i_of_part2++;
            my $probpart2 = $prob[1]->[$i_of_part2];
    
            my $i_of_part3 = -1;
            foreach my $base_part3 ( @{ $homopol[2] } ) {
                $i_of_part3++;
                my $probpart3 = $prob[2]->[$i_of_part3];
    
                my $nstr = $base_part1."".$base_part2."".$base_part3;
                my $prob_prod = sprintf("%.3f",$probpart1 * $probpart2 *$probpart3);
    
                print "$base_part1-$base_part2-$base_part3 \t";
                print "$probpart1 x $probpart2 x $probpart3 = $prob_prod\n";
    
            }
        }
    }
    

5 个答案:

答案 0 :(得分:4)

我建议使用Set::CrossProduct,它将创建一个迭代器来生成所有集合的叉积。因为它使用迭代器,所以不需要事先生成每个组合;相反,它会按需产生每一个。

use strict;
use warnings;
use Set::CrossProduct;

my @homopol = (
    [qw(T C CC G)],
    [qw(T TT C G A)],
    [qw(C CCC G)], 
);

my @prob = (
    [1.00,0.63,0.002,1.00],
    [0.72,0.03,1.00, 0.85,1.00],
    [1.00,0.97,0.02],
);

# Prepare by storing the data in a list of lists of pairs.
my @combined;
for my $i (0 .. $#homopol){
    push @combined, [];
    push @{$combined[-1]}, [$homopol[$i][$_], $prob[$i][$_]]
        for 0 .. @{$homopol[$i]} - 1;
};

my $iterator = Set::CrossProduct->new([ @combined ]);
while( my $tuple = $iterator->get ){
    my @h = map { $_->[0] } @$tuple;
    my @p = map { $_->[1] } @$tuple;
    my $product = 1;
    $product *= $_ for @p;
    print join('-', @h), ' ', join(' x ', @p), ' = ', $product, "\n";
}

答案 1 :(得分:2)

使用Algorithm::Loops而不更改输入数据的解决方案如下所示:

use Algorithm::Loops;

# Turns ([a, b, c], [d, e], ...) into ([0, 1, 2], [0, 1], ...)
my @lists_of_indices = map { [ 0 .. @$_ ] } @homopol;

NestedLoops( [ @lists_of_indices ], sub {
  my @indices = @_;
  my $prob_prod = 1; # Multiplicative identity
  my @base_string;
  my @prob_string;
  for my $n (0 .. $#indices) {
    push @base_string, $hompol[$n][ $indices[$n] ];
    push @prob_string, sprintf("%.3f", $prob[$n][ $indices[$n] ]);
    $prob_prod *= $prob[$n][ $indices[$n] ];
  }
  print join "-", @base_string; print "\t";
  print join "x", @prob_string; print " = ";
  printf "%.3f\n", $prob_prod;
});

但我认为通过将结构更改为更像

的结构,您实际上可以使代码更清晰
[ 
  { T => 1.00, C => 0.63, CC => 0.002, G => 0.83 },
  { T => 0.72, TT => 0.03, ... },
  ...
]

因为没有并行数据结构,您可以简单地迭代可用的基本序列,而不是迭代索引,然后在两个不同的位置查找这些索引。

答案 2 :(得分:0)

为什么不使用递归?将深度作为参数传递,让函数在循环内调用自身深度为+。

答案 3 :(得分:-1)

你可以通过创建一个与@homopol数组相同长度的数组(N说),来跟踪你正在查看的组合。实际上这个数组就像一个 基数为N的数字,元素为数字。以与写下基数N中的连续数字相同的方式迭代,例如(0 0 0 ... 0),(0 0 0 ... 1),...,(0 0 0 ... N- 1),(0 0 0 ... 1 0),....

答案 4 :(得分:-2)

方法1:从指数计算

计算homopol中长度的乘积(length1 * length2 * ... * lengthN)。然后,迭代i从零到产品。现在,你想要的指数是i%length1,(i / length1)%length2,(i / length1 / length2)%length3,...

方法2:递归

我被殴打了,看到了nikie的回答。 : - )