Question

我在perl中有两个哈希，每个哈希由大约250,000个元素组成。我必须将两个散列中的每个元素相互比较，并对彼此相等的元素执行另一种操作。我有以下代码，可以进行约600亿次比较，因此需要很长时间才能完成：

foreach $key1 (keys %large_hash_1)
    {
    foreach $key2 (keys %large_hash_2)
        {
        if($some_other_var{$key1} == $some_other_var{$key2}) # so actually I compare another hash variable, using the keys from %large_hash_1 and %large_hash_2
             {
             # I print some stuff here to an output file using the $key1 and $key2 variables
             }
        }
    }

有没有办法更快地做到这一点？

Answer 1

可能是。看来您可以将问题重新表述为

找到所有成对的密钥K1和K2：


$some_other_hash{K1} == $some_other_hash{K2}

K1存在于%hash1中，而K2存在于%hash2

因此，让我们尝试一种方法，在该方法中，您首先找到第一个条件的解决方案，然后查看它们是否满足第二个条件。遍历所有密钥对都是O（n ²），但是我们已经有了一种策略，可以快速找到映射到相同哈希值的密钥：使用另一个哈希！

让我们构建%some_other_hash的“反向哈希”，以便$hash7{VAL}生成%some_other_hash中所有键的列表，使得$some_other_hash{KEY} == VAL：

push @{$hash7{ $some_other_hash{$_} }, $_ for keys %some_other_hash;

那是一个O（n）运算。接下来，我们需要找到映射到多个键的值。

foreach my $v (keys %hash7) {
    @k = @{$hash7{$v}};
    next if @k < 2;
    ...
}

如果找到这样的值，请检查某些键是否在%hash1中，以及某些键是否在%hash2中。

foreach my $v (keys %hash7) {
    @k = @{$hash7{$v}};
    next if @k < 2;
    @k1 = grep { exists $hash1{$_} } @k;
    @k2 = grep { exists $hash2{$_} } @k;
    if (@k1 && @k2) {
        foreach my $k1 (@k1) {
            foreach my $k2 (@k2) {
                print "$k1 from %hash1 and $k2 from %hash2 ",
                      "have the same value $v in %some_other_hash\n";
                ...
            }
        }
    }
}

最坏的情况是，通常在%some_other_hash中找到由多个键映射的值，该循环为O（mn）。根据您的数据，此搜索可能比遍历%hash1和%hash2中的所有键对都要快得多。

更有效地比较两个散列中的所有元素

1 个答案: