Question

我有一个包含~15000个键的哈希：

say %hash =(key1=>[x,yz],key2=>[o,p,e] ,key3......till keys 15000)

我有一个文件（100 MB）

在该文件中，哈希的键在很多行中重复，因此我的文件可能包含

key1 there is adog
key2 there is cattt
key1 there is man
key3 there is elephant
key2 etc...............

现在我想要的是

foreach my $key (keys %hash)
{
    open (IN,$file) or die ;
    while ($input=<IN>){
        ($animal)=$input=~/$key.*?there is (.*?)/I;
        #I want to match the last occurrence of the pattern I.e key1 there is **man**
    }
    push @array,$animal;
}

如您所见，这样可以正常工作，但脚本会为每个键多次（15000）次打开文件，因此需要花费很多时间。

如何优化代码以便花费相对较少的时间

我用过

my $stg=`grep -w $key /path/to/file |tail -1`;

但仍然会执行grep命令15000次，这也需要很多时间。

我的问题是如何更快地执行此操作。

Answer 1

在读取每一行时，只需使用当前行中的值覆盖键值。这只是一次通过你的文件。

my %refs;
open my $IN '<', $file or die;
while ($input = <$IN>)
{
    my($key, $animal) = $input=~/^(^(\s+).*?there is (.*?)/I;
    $refs{$key} = $animal;
}

现在%refs包含每个键的最后一个条目的动物名称：

foreach my $key (%refs)
{
    print "$key = $refs{$key}\n";
}

文件perl中的多个模式匹配

1 个答案: