如何使用Perl在.txt
文件中找到前100个最常用的字符串(单词)?到目前为止,我有以下内容:
use 5.012;
use warnings;
open(my $file, "<", "file.txt");
my %word_count;
while (my $line = <$file>) {
foreach my $word (split ' ', $line) {
$word_count{$word}++;
}
}
for my $word (sort keys %word_count) {
print "'$word': $word_count{$word}\n";
}
但这仅计算每个单词,并按字母顺序组织。我想要文件中前100个最常用的单词,按出现次数排序。有什么想法吗?
答案 0 :(得分:8)
通过阅读精美的 perlfaq4 (1)联机帮助页,可以了解how to sort hashes by value。所以试试吧。它比你的方法更具惯用性“perlian”。
#!/usr/bin/env perl
use v5.12;
use strict;
use warnings;
use warnings FATAL => "utf8";
use open qw(:utf8 :std);
my %seen;
while (<>) {
$seen{$_}++ for split /\W+/; # or just split;
}
my $count = 0;
for (sort {
$seen{$b} <=> $seen{$a}
||
lc($a) cmp lc($b) # XXX: should be v5.16's fc() instead
||
$a cmp $b
} keys %seen)
{
next unless /\w/;
printf "%-20s %5d\n", $_, $seen{$_};
last if ++$count > 100;
}
当对自己运行时,前10行输出为:
seen 6
use 5
_ 3
a 3
b 3
cmp 2
count 2
for 2
lc 2
my 2