提取文本文件中每行中的字符串,并计算该文件中该字符串的出现次数

时间:2015-08-17 03:16:37

标签: string perl

我有一个带有示例输出的日志文件,如下所示。我想使用Perl提取每行中以k =开头的字符串的值,然后在完整的日志文件中计算该字符串的频率。

ThreadDistributor Dispatch k='/678605358297;type=F', i=2
ThreadDistributor Dispatch k='/678605358297;type=W', i=0
ThreadDistributor Dispatch k='/678605358297;type=W', i=1

预期结果:

k='/678605358297;type=F' occurs 1 times
k='/678605358297;type=W' occurs 2 times

这是我到目前为止所尝试的:

use Test::More qw(no_plan);

use strict;
use warnings;
use Data::Dumper;
$Data::Dumper::Sortkeys=1;
my @key;
my @keystrings;

open (INFILE, "1.txt") or die "ERROR:cannot open test result file $!";
foreach my $line (<INFILE>) {

    @key = split(' ',$line);
    push @keystrings, $key[2]

}

print "$key[2]\n";


my %counts;
$counts{$_}++ for @keystrings;
print Dumper(\%counts);

close INFILE;

1 个答案:

答案 0 :(得分:2)

使用正则表达式来获取您关心的字符串,并使用哈希来计算出现次数。将以下内容另存为count.pl:

#!/usr/bin/env perl
use strict;
use warnings;

my $leader = 'ThreadDistributor Dispatch';
my %dispatch_types;
while (<>) {
    chomp;
    next unless m|^$leader|; # Ignore anything else
    my ($type) = m|^$leader (k=\'.+?\'),|;
    defined $type
        or die "Invalid row : '$_'";
    # print "Type is $type\n";
    $dispatch_types{$type}++;
}

for my $type ( sort keys %dispatch_types ) {
    print "$type occurs " . $dispatch_types{$type} . " times\n";
}

并按以下方式运行:

cat my_log_file | count.pl

k='/678605358297;type=F' occurs 1 times
k='/678605358297;type=W' occurs 2 times