在制表符分隔数据中查找所有可能的功能组合(列)

时间:2010-10-01 16:29:06

标签: linux perl unix

我的数据如下:

1 1:-0.394668 2:-0.794872 3:-1 4:-0.871341 5:0.9365 6:0.75597 
1 1:-0.463641 2:-0.897436 3:-1 4:-0.871341 5:0.44378 6:0.121824 
1 1:-0.469432 2:-0.897436 3:-1 4:-0.871341 5:0.32668 6:0.302529 
-1 1:-0.241547 2:-0.538462 3:-1 4:-0.871341 5:0.9994 6:0.987166 
1 1:-0.757233 2:-0.948718 3:-1 4:-0.871341 5:-0.33904 6:0.915401 
1 1:-0.167147 2:-0.589744 3:-1 4:-0.871341 5:0.95078 6:0.991566 

第一列是类,接下来的6列是功能,我试图找到所有可能的功能组合(2个功能,3个功能,... 5个功能),

E.g:

feat1 - feat2
feat1 - feat3
...
feat5 - feat6
...
feat1 - feat2 -feat3 -feat4 -feat 5
feat1 - feat2 -feat3 -feat4 -feat 6
..etc..

其中一个文件feat12.txt包含:

1 1:-0.394668 2:-0.794872
1 1:-0.463641 2:-0.897436
1 1:-0.469432 2:-0.897436
-1 1:-0.241547 2:-0.538462
1 1:-0.757233 2:-0.948718
1 1:-0.167147 2:-0.589744

Perl中是否存在任何现有的实现?

1 个答案:

答案 0 :(得分:4)

当然,有Algorithm::Combinatorics和/或Set::CrossProduct,但很难从您的问题描述中找出哪一个更合适。

Maybe you can use something like this as a starting point:

 #!/usr/bin/perl

use strict; use warnings;
use Algorithm::Combinatorics qw( combinations );

while ( my $line = <DATA> ) {
    last unless $line =~ /\S/;
    my $row = [ $line =~  /([1-6]:\S+)/g ];
    for my $i (2 .. 6) {
        my $it = combinations($row, $i);
        while ( my $x = $it->next ) {
            print "@$x\n";
        }
    }
}

__DATA__
1 1:-0.394668 2:-0.794872 3:-1 4:-0.871341 5:0.9365 6:0.75597
1 1:-0.463641 2:-0.897436 3:-1 4:-0.871341 5:0.44378 6:0.121824
1 1:-0.469432 2:-0.897436 3:-1 4:-0.871341 5:0.32668 6:0.302529
-1 1:-0.241547 2:-0.538462 3:-1 4:-0.871341 5:0.9994 6:0.987166
1 1:-0.757233 2:-0.948718 3:-1 4:-0.871341 5:-0.33904 6:0.915401
1 1:-0.167147 2:-0.589744 3:-1 4:-0.871341 5:0.95078 6:0.991566
C:\Temp> c
1:-0.167147 2:-0.589744 3:-1
1:-0.167147 2:-0.589744 4:-0.871341
1:-0.167147 2:-0.589744 5:0.95078
…
2:-0.589744 3:-1 5:0.95078 6:0.991566
2:-0.589744 4:-0.871341 5:0.95078 6:0.991566
3:-1 4:-0.871341 5:0.95078 6:0.991566
1:-0.167147 2:-0.589744 3:-1 4:-0.871341 5:0.95078
1:-0.167147 2:-0.589744 3:-1 4:-0.871341 6:0.991566
1:-0.167147 2:-0.589744 3:-1 5:0.95078 6:0.991566
1:-0.167147 2:-0.589744 4:-0.871341 5:0.95078 6:0.991566
1:-0.167147 3:-1 4:-0.871341 5:0.95078 6:0.991566
2:-0.589744 3:-1 4:-0.871341 5:0.95078 6:0.991566
1:-0.167147 2:-0.589744 3:-1 4:-0.871341 5:0.95078 6:0.991566