找到单词的频率(Perl)

时间:2014-11-06 03:31:50

标签: perl word-frequency

我试图找到用户文件中的单词频率。我有它找到单词,行和字符的数字,但我有点坚持找到单词的频率。

这就是我所拥有的。我知道我需要创建一个哈希表并相应地将这些单词存储在哈希表中。但是,我有点坚持让输出正确。它现在打印出一堆乱码。

#!/usr/bin/perl

use warnings;
use strict;


open(FILE, "<test.txt") or die "Could not open file: $!";

my ($lines, $words, $chars) = (0,0,0);
my %count;

while (<FILE>) {


    $lines++;
    $chars += length($_);
    $words += scalar(split(/\s+/, $_));
    $count{$words}++;


}

print("Number of characters: $chars\n");
print("Number of words: $words\n");
print("Number of lines: $lines\n");

foreach $words (sort keys %count) {
        print("$words, $count{$words}\n");
}

非常感谢任何帮助!

1 个答案:

答案 0 :(得分:1)

我很快就完成了代码。

我相信这是你想要的,它是未经测试的。不过,它应该给你一些指示。

我添加了一些注释,以便提示我认为您的代码出错了。

#!/usr/bin/perl

use warnings;

use strict;


open(FILE, "<test.txt") or die "Could not open file: $!";

my ($lines, $words, $chars) = (0,0,0);

my %count;

while (my $line = <FILE>) {

$lines++;

# $chars += length($_); problem: also counts whitespaces. Probably not intended.
my $nr_of_chars =()= $line =~ /[a-z]/gi; #counts characters only
$chars += $nr_of_chars;

my @words = split(/\s+/, $line );

for my $word ( @words ){
$count{$word}++;
}

# $words += scalar ( split(/\s+/, $_ )); <- counts words, adds to $words
# $count{$words}++ <- this sets, for example, $count{7}++ if there were 7 in $words.
# that is quite certainly not helpful. You are not actually storing the word anywhere

$words += scalar( @words );
}

print("Number of characters: $chars\n");

print("Number of words: $words\n");

print("Number of lines: $lines\n");

foreach $word ( keys %count) {
    print("$word, ".$count{$word}."\n"); # print "$count{$word" does not work
}