合并列表列表

时间:2014-06-06 10:24:06

标签: perl

我有一个列表列表,如下所示:

[
    [ 1, 2, 3 ],
    [ 20, 30, 40, 50 ],
    [ 11, 15, 17 ],
    [ 20, 22, 25, 27 ],
    [ 1, 5, 10 ],
    [ 1, 100 ]
]

我想合并内部列表,其中一个列表中的任何元素与另一个列表中的任何元素匹配。这也需要能够处理多个重叠(因此在上面的示例中,列表中的3个将合并为一个)。所以在这种情况下,结果如下:

[
    [ 1, 2, 3, 5, 10, 100 ],     # 3 lists have been merged into one
    [ 11, 15, 17 ],              # Untouched due to no overlap
    [ 20, 22, 25, 30, 40, 50 ],  # 2 lists merged
]

是否有明显的算法或Perl模块可供使用?

4 个答案:

答案 0 :(得分:4)

以下是单通道解决方案,它使用数组引用有点神奇。

对于this algorithm jaredor和其他解决方案,请查看perlmonks:how to find combine common elements of an array?

use strict;
use warnings;

use List::MoreUtils qw(uniq);

my @data = map {[split]} <DATA>;

my %group = ();
for my $array (@data) {
    my @values = map {@$_} uniq map {$group{$_} || [$_]} @$array;
    @group{@values} = (\@values) x @values;
}
@data = uniq values %group;

# Resort to make things pretty
@$_ = sort {$a <=> $b} @$_ for @data;

use Data::Dump;
dd @data;

__DATA__
1 2 3
20 30 40 50
11 15 17
20 22 25 27
1 5 10
1 100

输出:

(
  [20, 22, 25, 27, 30, 40, 50],
  [11, 15, 17],
  [1, 2, 3, 5, 10, 100],
)

答案 1 :(得分:1)

只有子阵列的顺序不同,

use strict;
use warnings;

sub merge {
  my ($arr) = @_;

  my $i = 0;
  while ($i < $#$arr) {
    my $current = $arr->[$i];
    my %h;
    @h{@$current} = ();
    my @ovlap = grep { 
      grep exists $h{$_}, @{$arr->[$_]} 

    } ($i+1) .. $#$arr;

    my %seen;
    @$current = 
      sort {$a <=> $b}
      grep !$seen{$_}++,
      (@$current, map @$_, @$arr[@ovlap]);

    @$arr[@ovlap] = ();
    @$arr = grep defined, @$arr;

    $i++;
  }
  return $arr;
}

my $arr = [
    [ 1, 2, 3 ],
    [ 20, 30, 40, 50 ],
    [ 11, 15, 17 ],
    [ 20, 22, 25, 27 ],
    [ 1, 5, 10 ],
    [ 1, 100 ],
];
merge(merge($arr));
use Data::Dumper; print Dumper $arr;

输出

$VAR1 = [
      [
        1,
        2,
        3,
        5,
        10,
        100
      ],
      [
        20,
        22,
        25,
        27,
        30,
        40,
        50
      ],
      [
        11,
        15,
        17
      ]
    ];

答案 2 :(得分:1)

我用哈希来找到解决方案。如果您的列表可能包含重复的成员,则此解决方案会将其数量减少到1。

%shared表示列表共享的号码。然后,虽然有共享号码,但您合并列表(即您更改%shared中的信息)。一旦没有共享号码,您就可以从哈希中创建列表。

#!/usr/bin/perl
use warnings;
use strict;

use Data::Dumper;

my @lists = (
             [ 1, 5, 10 ],
             [ 10, 15, 17 ],
             [ 20, 22, 25, 27 ],
             [ 20, 30, 40, 50 ],
             [ 1, 2, 3 ],
             [ 1, 100 ],
            );

my %shared;
for my $i (0 .. $#lists) {
    undef $shared{$_}{$i} for @{ $lists[$i] };
}

while (my ($num) = grep 1 < keys %{ $shared{$_} }, keys %shared) {
    my @to_merge = keys %{ $shared{$num} };
    my $list = shift @to_merge;
    for my $merge (@to_merge) {
        print "Merging list $merge to $list\n";
        for my $h (values %shared) {
            if (exists $h->{$merge}) {
                delete $h->{$merge};
                undef $h->{$list};
            }
        }
    }
}

my %left;
undef $left{ (keys %{ $shared{$_} })[0] }{$_} for keys %shared;
my @result = map [ keys %$_ ], values %left;

print Dumper \@result;

答案 3 :(得分:0)

我的两便士值得。使用简单的循环和缓存来存储结果列表中每个元素的位置,以便每次都保存搜索。

#! /usr/bin/perl

use warnings;
use strict;
use Data::Dumper;

my $lol = [
    [ 1, 2, 3 ],
    [ 20, 30, 40, 50 ],
    [ 11, 15, 17 ],
    [ 20, 22, 25, 27 ],
    [ 1, 5, 10 ],
    [ 1, 100 ]
    ];

my @results = ();
my %resultCache;

sub elementInResults {
    my ($element) = $_[0];

    # return value for cache, or search if not in cache
    if (!defined $resultCache{$element} ) {
        # search for target in destination arrays
        for (my $destIndex = 0; $destIndex < @results;  $destIndex++) {
            if (grep (/$element/, @{$results[$destIndex]}) > 0 ) {
                $resultCache{$element} = $destIndex;
                last;
            }
        }
    }

    return $resultCache{$element};
}

my $srcCount=0;
# loop through  source arrays
for my $srcList (@$lol) {
    my $destIndex ;
    # loop through elements of array
    for (my $srcElementIndex=0; $srcElementIndex < @$srcList; $srcElementIndex++) {
        $destIndex = elementInResults($srcList->[$srcElementIndex]);
        if (defined $destIndex ) {
            # element exists in an existing result array so merge
            print "Merging source array $srcCount into result array $destIndex, match on:" . $srcList->[$srcElementIndex] . "\n";
            # remove the duplicate element from src list first
            splice(@$srcList,$srcElementIndex,1);
            # then merge into dest list
            push (@{$results[$destIndex]}, @$srcList);
            last;
        }
        $srcElementIndex++;
    }

    # if no elements in list found in existing results add list as new one to results
    push (@results, $srcList) if (!defined $destIndex ) ;

    $srcCount++;
}

map {@$_ = sort ({$a <=> $b} @$_)} @results;

print Dumper \@results;