捕获一行中的所有单词并使用Perl正则表达式计算它们的出现次数

时间:2017-10-05 14:57:41

标签: regex perl

我想知道一个段落中有多少单词然后找到每个单词出现的计数。 我能做到,但还有其他方法可以使用正则表达式吗?

my $string = "John is a good boy. John goes to school with his brother Johnny. When John is hungry, he eats his tiffin.";
my @list = ();
while($string =~ /(\b\w+\b)/gi)
{
        push(@list, $1);
}

my %counts;
for (@list) {
   $counts{$_}++;
}
print "$#list \n";
foreach my $keys (keys %counts) {
   print "$keys = $counts{$keys}\n";
}

输出应为

20
brother = 1
a = 1
goes = 1
is = 2
good = 1
to = 1
tiffin = 1
When = 1
boy = 1
his = 2
school = 1
Johnny = 1
he = 1
eats = 1
John = 3
with = 1
hungry = 1

1 个答案:

答案 0 :(得分:2)

我无法看到纯粹使用正则表达式的方法,如果确实存在这种方式,那将是一个非常复杂的正则表达式,很难维护。但是可以通过使用哈希并丢失列表来简化你所拥有的东西;

use strict;
use warnings;

my $string = "John is a good boy. John goes to school with his brother Johnny. When John is hungry, he eats his tiffin.";
my %counts;
my $word_count = 0;
while($string =~ /\b(\w+)\b/g)
    {
    $counts{$1}++;
    $word_count++;
    }

print "$word_count\n";
foreach my $keys (keys %counts)
    {
    print "$keys = $counts{$keys}\n";
    }

注意:我已经稍微调整了正则表达式,因为您不需要" \ b"在捕获组内部并且不区分大小写并不是必需的,因为您没有匹配特定的字符串。并添加"使用严格;"和"使用警告;"你应该总是在perl的顶部放置任何问题。