Question

我正在努力完成分配给我们的自然语言处理课程中的Perl任务。

他们要求我们使用Perl解决的问题如下：

输入：程序从stdin中以形式和类型接收两个输入； perl program.pl
处理和输出：

第1部分：该程序将filename.txt中的单词标记化，并将这些单词及其出现频率存储在散列中

第2部分：程序将输入用于哈希目的。如果在散列中找不到该单词（因此在文本中），则将单词的出现频率打印为零。如果这个词确实可以在散列中找到的单词，会打印出散列中单词的相应频率值。

根据经验，我确信我的脚本已经能够执行上述“第1部分”。

第2部分需要使用Perl子程序（子例程）来完成，该子程序通过引用来获取哈希以及to哈希。这是我遇到严重麻烦的部分。

在重大更改之前的第一个版本Stefan Becker建议；

#!/usr/bin/perl                                                                           

use warnings;
use strict;

sub hash_4Frequency
{
    my ($hashWord, $ref2_Hash) = @_;                       
    print $ref2_Hash -> {$hashWord}, "\n";  # thank you Stefan Becker, for sobriety
}

my %f = ();  # hash that will contain words and their frequencies                              
my $wc = 0;  # word-count                                       

my ($stdin, $word_2Hash) = @ARGV;  # corrected, thanks to Silvar

while ($stdin)
{
    while ("/\w+/")
    {
        my $w = $&;
        $_ = $";
        $f{lc $w} += 1;
        $wc++;
    }
}

my @args = ($word_2Hash, %f);
hash_4Frequency(@args);

经过一些更改的第二个版本；

#!/usr/bin/perl

use warnings;
use strict;

sub hash_4Frequency
{
    my $ref2_Hash = %_;
    my $hashWord = $_;

    print $ref2_Hash -> {$hashWord}, "\n";
}

my %f = ();  # hash that will contain words and their frequencies
my $wc = 0;  # word-count

while (<STDIN>) 
{
    while (/\w+/)
    {
        chomp;
        my $w = $&;
        $_ = $";

        $f{$_}++ foreach keys %f;
        $wc++;
    }
}

hash_4Frequency($_, \%f);

当我在终端中执行'./script.pl

 Use of uninitialized value $hashWord in hash element at   
 ./word_counter2.pl line 35.

 Use of uninitialized value in print at ./word_counter2.pl line 35.

Perl抱怨第二版；

 Can't use string ("0") as a HASH ref while "strict refs" in use at ./word_counter2.pl line 13, <STDIN> line 8390.

至少现在我知道脚本可以成功运行到最后一点，而且似乎有些语义而不是语法。

关于这最后一部分还有其他建议吗？将会非常感激。

P.S .：对不起，朝圣者们，我只是Perl的新手。

Answer 1

通过此示例在命令行上进行的快速测试显示了一种正确的语法，用于传递单词和对函数的哈希引用：

use strict;
use warnings;
use v5.18;
sub foo {
    my $word = $_[0];
    shift;
    my $hsh = $_[0];
    say $word; say $hsh->{$word};
};
foo("x", {"x" => 4});
# prints x and 4

这会将参数列表视为一个数组，获取第一个元素并每次将其弹出。相反，我实际上建议同时获取两个参数：my ($word, $hsh) = @_;

您访问哈希ref元素的语法可能很正确，但是我发现更容易记住C ++和perl之间共享的语法：箭头表示取消引用。另外，您知道使用箭头语法时永远不会意外复制数据结构。

Answer 2

您的固定版本并不比您的第一个好。尽管它通过了语法检查，但仍存在一些语义错误。这是一个版本，使用的修复程序最少，可以正常运行

注意：这不是您在惯用的Perl中编写它的方式。

#!/usr/bin/perl
use warnings;
use strict;

sub hash_4Frequency($$) {
    my($ref2_Hash, $hashWord) = @_;

    print $ref2_Hash -> {$hashWord}, "\n";
}

my %f = ();  # hash that will contain words and their frequencies
my $wc = 0;  # word-count

while (<STDIN>)
{
    chomp;
    while (/(\w+)/g)
    {
        $f{$1}++;
        $wc++;
    }
}

hash_4Frequency(\%f, $ARGV[0]);

使用“ Lorem ipsum”作为输入文本来测试输出：

$ cat dummy.txt 
Lorem ipsum dolor sit amet, consectetur adipisici elit, sed eiusmod tempor
incidunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquid ex ea commodi consequat.
Quis aute iure reprehenderit in voluptate velit esse cillum dolore eu fugiat
nulla pariatur. Excepteur sint obcaecat cupiditat non proident, sunt in culpa
qui officia deserunt mollit anim id est laborum.

$ perl <dummy.txt dummy.pl Lorem
1

奖金代码：这将是我第一次刺破给定的问题。您的第一个版本将所有单词都小写，这是有道理的，因此我保留了它：

#!/usr/bin/perl
use warnings;
use strict;

sub word_frequency($$) {
    my($hash_ref, $word) = @_;

    print "The word '${word}' appears ", $hash_ref->{$word} // 0, " time(s) in the input text.\n";
}

my %words;  # hash that will contain words and their frequencies
my $wc = 0; # word-count

while (<STDIN>) {
    # lower case all words
    $wc += map { $words{lc($_)}++ } /(\w+)/g
}

print "Input text has ${wc} words in total, of which ",
      scalar(keys %words),
      " are unique.\n";

# return frequency in input text for every word on the command line
foreach my $word (@ARGV) {
    word_frequency(\%words, lc($word));
}

exit 0;

试运行

$ perl <dummy.txt dummy.pl Lorem ipsum dolor in test
Input text has 66 words in total, of which 61 are unique.
The word 'lorem' appears 1 time(s) in the input text.
The word 'ipsum' appears 1 time(s) in the input text.
The word 'dolor' appears 1 time(s) in the input text.
The word 'in' appears 2 time(s) in the input text.
The word 'test' appears 0 time(s) in the input text.

当我尝试将散列（按引用）和变量传递给子项以在散列中打印相应值时，修复了Perl错误

2 个答案: