我有两个哈希:
1)%redundant_text将文档中关键短语的起始位置作为散列键,每个关键短语的长度为值。
2)%itter_w将文档中每个单词的序号作为其键(1,2,3等),相应的单词作为每个键的值。
我想通过提取分配给由%redundant_text哈希元素确定的起始位置和结束位置之间的键的%itter_w中的所有值(单词),从文档中创建一个关键短语数组。下面的代码成功完成了这项工作,但非常非常缓慢。关于如何构造此代码以最大化输出生成过程的速度的任何想法?
my @redundant_text ;
my @all_redundant_text ;
foreach my $key (keys %redundant_text) {
my $start_position = $key ;
my $end_position = $key + $redundant_text{$key}+10 ;
foreach my $word (sort {$a<=>$b} keys %itter_w) {
next if (($word < $start_position)||($word>$end_position)) ;
push (@redundant_text, $itter_w{$word})
}
### Blanking out the redundant text array. ###
my $redundant_sequence = join(' ', @redundant_text) ;
@redundant_text = () ;
push (@all_redundant_text, $redundant_sequence) ;
}
答案 0 :(得分:3)
这会快得多。以下是主要变化:
use strict;
use warnings;
use feature qw( say );
# Formerly known as %itter_w.
my %words_by_index = (
0 => "I", 5 => "array", 10 => "the", 15 => "the",
1 => "want", 6 => "of", 11 => "document", 16 => "values",
2 => "to", 7 => "key", 12 => "by", 17 => "words",
3 => "create", 8 => "phrases", 13 => "extracting", 18 => "from",
4 => "an", 9 => "from", 14 => "all", 19 => "itter_w",
);
# Formerly known as %redundant_text.
my %phrase_lengths_by_offset = (2=>3, 10=>4);
# Sort before the loop, and convert to a more-efficient array.
my @words = map { $words_by_index{$_} } sort { $a <=> $b } keys(%words_by_index);
my @phrases;
for my $offset( sort { $a <=> $b } keys(%phrase_lengths_by_offset)) {
my $length = $phrase_lengths_by_offset{$offset};
push @phrases, join(' ', @words[$offset .. $offset+$length-1]);
}
say for @phrases;
输出:
to create an
the document by extracting
答案 1 :(得分:2)
首先,由于%itter_w
永远不会在循环中发生变化,因此您只需要在循环之外而不是在每次迭代中对其进行一次排序。
my @words = sort {$a<=>$b} keys %itter_w;
foreach my $key (...) {
...
foreach my $word (@words) {
...
}
...
}
答案 2 :(得分:2)
另一方面,由于$word
仅在您遍历@words
时变大,因此您应该能够使循环短路。当$end_position
很小但$word
变得非常大时,这将节省大量时间:
foreach my $word (sort {$a<=>$b} keys %itter_w) {
next if $word < $start_position;
last if $word > $end_position;
push (@redundant_text, $itter_w{$word})
}