使用Perl在散列值数组中查找重复数组和数组的交集

时间:2015-09-16 05:59:53

标签: arrays perl hash

我想从包含数组的哈希中找到重复的数组。重点是,我正在尝试开发集合并将它们存储到Perl的哈希表中。之后,我需要提取 1.那些完全重复的数组(所有值都相同)。 2.数组交叉

Source code is given as under:  


use strict;
use warnings;

my @test1= ("Bob", "Flip", "David");
my @test2= ("Bob", "Kevin", "John", "Michel");
my @test3= ("Bob", "Flip", "David");
my @test4= ("Haidi", "Bob",  "Grook", "Franky");
my @test5= ();
my @test6=();

my %arrayHash= ( "ppl1" => [@test1],
             "ppl2"=> [@test2], 
             "ppl3" => [@test3],
             "ppl4"=> [@test4], 
             "ppl5"=> [@test5],
             "ppl6"=> [@test6],  

            );


Required Output:  ppl1 and   ppl3 have duplicate lists
Intersection of arrays= Bob

请注意,不需要复制空数组!

3 个答案:

答案 0 :(得分:1)

所以这里有一系列步骤:

  • 将您的阵列相互比较。这更难,因为你正在做多元素数组。您无法直接测试等效性,因为您需要比较成员

  • 从另一个中过滤掉一个。

首先:

(编辑:空白处理)

#!/usr/bin/env perl

use strict;
use warnings;

my @test1 = ( "Bob",   "Flip",  "David" );
my @test2 = ( "Kevin", "John",  "Michel" );
my @test3 = ( "Bob",   "Flip",  "David" );
my @test4 = ( "Haidi", "Grook", "Franky" );
my @test5 = ();
my @test6 = ();

my %arrayHash = (
    "ppl1" => [@test1],
    "ppl2" => [@test2],
    "ppl3" => [@test3],
    "ppl4" => [@test4],
    "ppl5" => [@test5],
    "ppl6" => [@test6],

);

my %seen;

#cycle through the hash
foreach my $key ( sort keys %arrayHash ) {

    #skip empty:
    next unless @{ $arrayHash{$key} };

    #turn your array into a string - ':' separated
    my $value_str = join( ":", sort @{ $arrayHash{$key} } );

    #check if that 'value string' has already been seen
    if ( $seen{$value_str} ) {
        print "$key is a duplicate of $seen{$value_str}\n";
    }
    $seen{$value_str} = $key;
}

现在请注意 - 这是作弊的 - 它将您的数组与:粘在一起,这在每个场景中都不起作用。

("Bob:", "Flip")("Bob", ":Flip")最终会相同。

如果您有多个副本,它也只会打印您最近的副本。

您可以通过将多个值推入%seen哈希来解决此问题。

答案 1 :(得分:0)

use strict;
use warnings;

my @test1= ("Bob", "Flip", "David");
my @test2= ("Kevin", "John", "Michel");
my @test3= ("Bob", "Flip", "David");
my @test4= ("Haidi", "Grook", "Franky");

my %arrayHash= ( "1" => \@test1,
             "2"=> \@test2,
             "3" => \@test3,
             "4"=> \@test4,

            );

sub arrayCmp {
        my @array1 = @{$_[0]};
        my @array2 = @{$_[1]};

        return 0 if ($#array1 != $#array2);

        @array1 = sort(@array1);
        @array2 = sort(@array2);

        for (my $ii = 0; $ii <= $#array1; $ii++) {
                if ($array1[$ii] ne $array2[$ii]) {
                        #print "$array1[$ii] != $array2[$ii]\n";
                        return 0;
                }
        }

        return 1;
}


my @keyArr = sort(keys(%arrayHash));
for(my $i = 0; $i <= $#keyArr - 1; $i++) {

        my @arr1 = @{$arrayHash{$keyArr[$i]}};

        for(my $j = 1; $j <= $#keyArr; $j++) {
                my @arr2 = @{$arrayHash{$keyArr[$j]}};
                if ($keyArr[$i] ne $keyArr[$j] && arrayCmp(\@arr1, \@arr2) == 1) {
                        print "$keyArr[$i] and $keyArr[$j] are duplicates\n";
                }
        }
}

输出此

1 and 3 are duplicates

答案 2 :(得分:0)

您需要检查两个数组的哈希键是否相等。为此,您可以使用Rack send-file进行比较。

接下来,您可以使用smart match operator过滤掉不重复的值,并使用哈希来跟踪已经检查过的值。

col_combi = [('a','b'), ('b','c'), ('d','e'), ('l','j'), ('c','g'), 
             ('e','m'), ('m','z'), ('z','p'), ('t','k'), ('k', 'n'), 
             ('j','k')]

from itertools import combinations

sets = [set(x) for x in col_combi]

stable = False
while not stable:                        # loop until no further reduction is found
    stable = True
    # iterate over pairs of distinct sets
    for s,t in combinations(sets, 2):
        if s & t:                        # do the sets intersect ?
            s |= t                       # move items from t to s 
            t ^= t                       # empty t
            stable = False

    # remove empty sets
    sets = list(filter(None, sets)) # added list() for python 3

print sets

<强>输出:

[set(['a', 'c', 'b', 'g']), set(['p', 'e', 'd', 'z', 'm']), set(['t', 'k', 'j', 'l', 'n'])]