Question

我正在尝试用Perl编写一个程序，它应该返回文件中所有单词的频率和文件中每个单词的长度（不是所有字符的总和！），以便从西班牙语文本生成Zipf曲线（如果你不知道Zipf曲线是什么的话，这不是什么大问题）。现在我的问题是：我可以做第一部分，我得到所有单词的频率，但我不知道如何得到每个单词的长度！ :(我知道命令行 $ word_length = length（$ words）但在尝试更改代码后，我真的不知道应该在哪里包含它以及如何计算每个单词的长度。

这就是我的代码看起来的样子，直到知道：

#!/usr/bin/perl
use strict;
use warnings;

my %count_of;
while (my $line = <>) { #read from file or STDIN
  foreach my $word (split /\s+/gi, $line){
     $count_of{$word}++;
  }
}
print "All words and their counts: \n";
for my $word (sort keys %count_of) {
  print "$word: $count_of{$word}\n";
}
__END__

我希望有人有任何建议！

Answer 1

如果要存储单词的长度，可以使用哈希哈希值。

while (my $line = <>) {
    foreach my $word (split /\s+/, $line) {
        $count_of{$word}{word_count}++;
        $count_of{$word}{word_length} = length($word);
    }
}

print "All words and their counts and length: \n";
for my $word (sort keys %count_of) {
    print "$word: $count_of{$word}{word_count} ";
    print "Length of the word:$count_of{$word}{word_length}\n";
}

Answer 2

这将打印计数旁边的长度：

  print "$word: $count_of{$word} ", length($word), "\n";

Answer 3

仅供参考 -

的另一种可能性

length length($word)

可能是：

$word =~ s/(\w)/$1/g

它不像工具那么清晰，但可以就此问题提供其他观点（TIMTOWTDI :)）

小解释：

\ w 和 g 修饰符匹配 $ word

中的每个字母

$ 1 可防止 s ///

覆盖原始 $ word

s /// 会在 $ word

中返回字母数（与 \ w 匹配）

用Perl计算文本中每个单词的字母

3 个答案: