Question

我是Perl的新手，为了我的一份作业，我提出了这样的解决方案：

#wordcount.pl FILE 
    # 

    #if no filename is given, print help and exit 
    if (length($ARGV[0]) < 1) 
    { 
           print "Usage is : words.pl word filename\n"; 
           exit; 
    } 

   my $file = $ARGV[0];          #filename given in commandline 

   open(FILE, $file);            #open the mentioned filename 
   while(<FILE>)                 #continue reading until the file ends 
    { 
           chomp; 
           tr/A-Z/a-z/;          #convert all upper case words to lower case 
           tr/.,:;!?"(){}//d;            #remove some common punctuation symbols 
           #We are creating a hash with the word as the key.  
           #Each time a word is encountered, its hash is incremented by 1. 
           #If the count for a word is 1, it is a new distinct word. 
           #We keep track of the number of words parsed so far. 
           #We also keep track of the no. of words of a particular length.  

          foreach $wd (split) 
          { 
                $count{$wd}++; 
                if ($count{$wd} == 1) 
                 { 
                       $dcount++; 
                 } 
                $wcount++; 
                $lcount{length($wd)}++; 
          } 
   } 

   #To print the distinct words and their frequency,  
   #we iterate over the hash containing the words and their count. 
   print "\nThe words and their frequency in the text is:\n"; 
   foreach $w (sort keys%count) 
   { 
         print "$w : $count{$w}\n"; 
   } 

   #For the word length and frequency we use the word length hash 
   print "The word length and frequency in the given text is:\n"; 
   foreach $w (sort keys%lcount) 
   { 
         print "$w : $lcount{$w}\n"; 
   } 

   print "There are $wcount words in the file.\n"; 
   print "There are $dcount distinct words in the file.\n"; 

   $ttratio = ($dcount/$wcount)*100;       #Calculating the type-token ratio. 

   print "The type-token ratio of the file is $ttratio.\n";

我已将评论纳入其中。实际上我必须从给定的文本文件中找到单词count。上述程序的输出如下：

The words and their frequency in the text is: 
1949 : 1
a : 1
adopt : 1
all : 2
among : 1
and : 8
assembly : 1
assuring : 1
belief : 1
citizens : 1
constituent : 1
constitute : 1
.
.
.
The word length and frequency in the given text is:
1 : 1
10 : 5
11 : 2
12 : 2
2 : 15
3 : 18
There are 85 words in the file. 
There are 61 distinct words in the file. 
The type-token ratio of the file is 71.7647058823529.

即使在谷歌的帮助下，我也能找到我的作业解决方案。但是我认为使用Perl的真正功能将会有一个更小巧简洁的代码。任何人都可以用更少的代码行给我一个Perl解决方案吗？

Answer 1

以下是一些建议：

在您的Perl脚本中包含use strict和use warnings。
您的参数验证不测试它应该测试的内容：（1）@ARGV中是否只有1个项目，以及（2）该项目是否是有效的文件名。
虽然每条规则都有例外，但通常最好将<>的返回值分配给命名变量，而不是依赖$_。如果循环内的代码可能需要使用Perl的构造之一，而且依赖于$_（例如，map，grep或后修复{{1}，则尤其如此。 }}}}）
```
for
```
Perl为小写字符串提供内置函数（while (my $line = <>){ ... }）。
您正在线读取循环中执行不必要的计算。如果您只是建立一个单词的计数，您将获得所需的所有信息。另请注意，Perl为其大多数控制结构提供了单行表单（lc，for，while等），如下所示。
```
if
```
然后，您可以使用单词tallies来计算您需要的其他信息。例如，唯一字的数量只是散列中的键数，而字总数是散列值的总和。

字长的分布可以这样计算：

while (my $line = <>){
    ...
    $words{$_} ++ for split /\s+/, $line;
}

Answer 2

使用像你这样的哈希是一个很好的方法。解析文件的更多perl方法是使用带有/ g标志的正则表达式来读取行中的单词。 \w+表示一个或多个字母数字。

while( <FILE> )
{
    while( /(\w+)/g )
    {
        my $wd = lc( $1 );
        ...

     }
 }

如何以更多的方式完成此操作

2 个答案: