Perl分割功能基础从输入文件中读取每个单词

时间:2014-10-06 19:15:24

标签: perl

我无法理解为什么此代码不会输出任何内容:

#!/usr/bin/perl -w
use strict;
my %allwords = (); #Create an empty hash list.
my $running_total = 0;
while (<>) {
  print "In the loop 1";
  chomp;
  print "Got here";
  my @words = split(/\W+/,$_);
}
foreach my $val (my @words) {
    print "$val\n";
}

我使用以下命令从终端运行它:

perl wordfinder.pl < exampletext.txt

我希望上面的代码输出输入文件中的每个单词,但除了“在循环1中”和“在这里”之外,它不输出任何内容。我正在尝试使用我指定的split参数逐字分隔输入文件。

更新1:在这里,我已经将变量声明在适当的范围内,这是我的主要问题。现在我从输入文件中获取所有单词以在终端上输出:

my %allwords = (); #Create an empty hash list.
my $running_total = 0;
my @words = ();
my $val;
while (<>) {
  print "Inputting words into an array! \n";
  chomp;
  @words = split(/\W+/,$_);
}
print("Words have been input successfully, performing analysis: \n");
foreach $val (@words) {
    print "$val\n";
}

更新2:已取得进展。现在,我们将来自任何输入文件的所有单词放入散列中,然后从散列中打印每个唯一键(即在所有输入文件中找到的每个唯一单词)。

#!/usr/bin/perl -w
use strict;
# Description: We want to take ALL text files from the command line input and calculate
# the frequencies of the words contained therein.

# Step 1: Loop over all words in all input files, and put each new unique word in a    
# hash (check to see if contained in hash, if not, put the word in; if the word already    
# exists in the hash, then increase its "total" by 1). Also, keep a running total of    
# all words.
print("Welcome to word frequency finder. \n");
my $running_total = 0;
my %words;
my $val;
while (<>) {
  chomp;
  foreach my $str (split(/\W+/,$_)) {
    $words{$str}++;
    $running_total++;
  }
}
print("Words have been input successfully, performing analysis: \n");

# Step 2: Loop over all entries in the hash and look for the word (key) with the
# maximum amount, and then remove this from the hash and put in a separate list.    
# Do this until the size of the separate list is 10, since we want the top 10 words.
foreach $val (keys %words) {
    print "$val\n";
}

1 个答案:

答案 0 :(得分:0)

由于您已经完成了第1步,因此您将获得前十个最常用的单词。我们不是循环遍历哈希并找到最常用的条目,而是让Perl通过按其值对哈希进行排序来为我们工作。

要按其键排序%words哈希,我们可以使用表达式sort keys %words;要按其值对哈希进行排序,但能够访问其键,我们需要一个更复杂的表达式:

sort { $words{$a} <=> $words{$a} } keys %words

将其分解,以数字方式排序,我们使用表达式

sort { $a <=> $b } @array

(有关排序中使用的特殊变量$a$b的更多信息,请参阅[perl sort] [1]

sort { $a <=> $b } keys %words

会对哈希键进行排序,因此要对值进行排序,我们会

sort { $words{$a} <=> $words{$b} } keys %words

请注意,输出仍然是哈希%words的键。

我们实际上想要从高到低排序,因此交换$a$b来反转排序方向:

sort { $words{$b} <=> $words{$a} } keys %words

由于我们正在编制前十名列表,我们只希望我们的哈希值中的前十个。可以通过获取哈希值来实现这一点,但最简单的方法就是使用累加器来计算我们在前十位中有多少条目:

my %top_ten;
my $i = 0;

for (sort { $words{$b} <=> $words{$a} } keys %words) {
    # $_ is the current hash key
    $top_ten{$_} = $words{$_};
    $i++;
    last if $i == 10;
}

我们已经完成了!