使用hash perl存储每个单词的行号和出现次数

时间:2013-10-28 17:10:00

标签: perl hash key

我正在逐字逐句地读取文件(其中文件包含单词行)并将每个单词存储到散列中。我想存储出现的次数以及找到该单词的哪一行(注意:我将根据单词本身对散列进行排序,如代码所示)

我有(unworking)(假设单词数组的单词存储正确,没有特殊字符,并且是小写的):

my %wordlist;
my $line = 0;

foreach my $word (@words) {
  $line++;

  if (exists $wordlist{$word}) {
      $wordlist{$word} += 1;
      $wordlist{$line} = $wordlist{$line} . ", $line";
  }
  else {
      $wordlist{$word} = 1;
      $wordlist{$line} = "$line";
  }  
}

后来我尝试在一个包含:

的循环中打印$ wordlist {$ line}作为字符串
printf "%${length}s: %4d times, on lines %s\n", $key, $wordlist{$key}, $wordlist{$line};

运行时,我收到错误:

Use of uninitialized value in printf at ./wc.pl line 105, <FILE> line 20.
someWord:    2 time(s), line(s) 

其中第20行是退出声明

2 个答案:

答案 0 :(得分:0)

$wordlist{$line}   # Line data for each line

应该是

$wordline{$word}   # Line data for each word

在输出之前格式化输出通常是一种不好的做法。这也不例外。

if (exists $wordlist{$word}) {
    ++$wordlist{$word};
    push @{ $wordline{$word} }, $line;
}
else {
    ++$wordlist{$word};
    push @{ $wordline{$word} }, $line;
}

当然简化为

++$wordlist{$word};
push @{ $wordline{$word} }, $line;

printf中,您可以使用

join(', ', @{ $wordline{$word} })

但是$wordlist{$word}只是@{ $wordline{$word} }中元素的数量,所以它完全不需要。只需使用

0+@{ $wordline{$word} }

而不是

$wordlist{$word}

所以你最终得到了

use strict;
use warnings;

use List::Util qw( max );

my %wordlines;
while (<>) {
   chomp;
   push @{ $wordlines{$_} }, $.;
}

my $max_len_p1 = 1 + max map length, keys %wordlines;
my $max_count_len = max map length(0+@$_), values %wordlines;
my $format = "%-${max_len_p1}s %${max_count_len}d times, on lines %s\n";

for my $word (
   sort { @{ $wordlines{$b} } <=> @{ $wordlines{$a} } || $a cmp $b }
      keys %wordlines
) {
   printf($format,
      "$word:",
      0+@{ $wordlines{$word} },
      join(', ', @{ $wordlines{$word} }),
   );
}

输入:

cat
house
stair
chari
stair
mouse
stool
cat
hat

输出:

cat:   2 times, on lines 1, 8
stair: 2 times, on lines 3, 5
chari: 1 times, on lines 4
hat:   1 times, on lines 9
house: 1 times, on lines 2
mouse: 1 times, on lines 6
stool: 1 times, on lines 7

答案 1 :(得分:0)

您可以尝试以下示例,它应该为您提供一个良好的基础来开始和修改。

use strict;
use warnings;

my @words = <>;
my %wordlist;
my $line = 0;

foreach my $word (@words) {
        chomp($word);
        push (@{$wordlist{$word}}, ++$line);
}

foreach my $word (keys %wordlist){
        my $count = @{$wordlist{$word}};
        my $lines = join (', ',@{$wordlist{$word}});
        printf ("%-10s: %4d times, on lines %s\n", $word, $count, $lines);
}

此示例使用perls autovivification来动态创建数据结构(如果尚未定义)。本质上,它读取的每个单词都会将行号推送到散列中该单词键的数组。如果从未见过该单词,则autovivifaction将在哈希中创建密钥,并在哈希值中以类似方式创建数组。

然后对于输出我们可以得到单词,因为它的键,我们可以通过couting哈希值数组中存在的行号的数量得到它的次数,我们可以创建一个字符串使用join的行号。

然后我们可以用printf打印出这些值。

的单词列表
cat
house
stair
chari
stair
mouse
stool
cat
hat

将产生

的输出
mouse     :    1 times, on lines 6
cat       :    2 times, on lines 1, 8
hat       :    1 times, on lines 9
stool     :    1 times, on lines 7
chari     :    1 times, on lines 4
stair     :    2 times, on lines 3, 5
house     :    1 times, on lines 2