Question

我有两个哈希：

1）％redundant_text将文档中关键短语的起始位置作为散列键，每个关键短语的长度为值。

2）％itter_w将文档中每个单词的序号作为其键（1,2,3等），相应的单词作为每个键的值。

我想通过提取分配给由％redundant_text哈希元素确定的起始位置和结束位置之间的键的％itter_w中的所有值（单词），从文档中创建一个关键短语数组。下面的代码成功完成了这项工作，但非常非常缓慢。关于如何构造此代码以最大化输出生成过程的速度的任何想法？

my @redundant_text ;
my @all_redundant_text ;

foreach my $key (keys %redundant_text) {
    my $start_position = $key  ;
    my $end_position = $key + $redundant_text{$key}+10 ;
    foreach my $word (sort {$a<=>$b} keys %itter_w) {
        next if (($word < $start_position)||($word>$end_position)) ;
        push (@redundant_text, $itter_w{$word})
    }
    ### Blanking out the redundant text array. ###
    my $redundant_sequence = join(' ', @redundant_text) ;
    @redundant_text = () ;
    push (@all_redundant_text, $redundant_sequence) ;
}

Answer 1

这会快得多。以下是主要变化：

排序一次的长度列表，而不是每个短语一次。 [暴徒的第二个解决方案也是如此]
只查找所需数量的单词，而不是遍历所有单词。 [暴徒第一个解决方案的更好版本]
将单词从散列复制到数组中以便更快地查找。

use strict;
use warnings;
use feature qw( say );

# Formerly known as %itter_w.
my %words_by_index = (
   0 => "I",        5 => "array",     10 => "the",          15 => "the",   
   1 => "want",     6 => "of",        11 => "document",     16 => "values",
   2 => "to",       7 => "key",       12 => "by",           17 => "words",
   3 => "create",   8 => "phrases",   13 => "extracting",   18 => "from",
   4 => "an",       9 => "from",      14 => "all",          19 => "itter_w",
);
# Formerly known as %redundant_text.
my %phrase_lengths_by_offset = (2=>3, 10=>4);

# Sort before the loop, and convert to a more-efficient array.
my @words = map { $words_by_index{$_} } sort { $a <=> $b } keys(%words_by_index);

my @phrases;
for my $offset( sort { $a <=> $b } keys(%phrase_lengths_by_offset)) {
   my $length = $phrase_lengths_by_offset{$offset};
   push @phrases, join(' ', @words[$offset .. $offset+$length-1]);
}

say for @phrases;

输出：

to create an
the document by extracting

Answer 2

首先，由于%itter_w永远不会在循环中发生变化，因此您只需要在循环之外而不是在每次迭代中对其进行一次排序。

my @words = sort {$a<=>$b} keys %itter_w;
foreach my $key (...) {
   ...
   foreach my $word (@words) {
      ...
   }
   ...
}

Answer 3

另一方面，由于$word仅在您遍历@words时变大，因此您应该能够使循环短路。当$end_position很小但$word变得非常大时，这将节省大量时间：

foreach my $word (sort {$a<=>$b} keys %itter_w) {
    next if $word < $start_position;
    last if $word > $end_position;
    push (@redundant_text, $itter_w{$word})
}

在Perl中拉出哈希值序列

3 个答案: