Perl:Count和regex匹配

时间:2015-06-16 14:44:13

标签: regex perl sorting

我在Perl脚本中遇到了一个问题。脚本生成的输出包括以下内容:

...
2:Jun  9 16:17:14 localhost sshd[3042]: Accepted password for testuser1 from 192.168.0.105 port 56067 ssh2
10:Jun  9 16:31:33 localhost sshd[3176]: Accepted password for testuser1 from 192.168.0.105 port 56136 ssh2
16:Jun  9 16:32:06 localhost sshd[3244]: Accepted password for testuser1 from 192.168.0.105 port 56137 ssh2
24:Jun  9 16:35:26 localhost sshd[3355]: Accepted password for testuser1 from 192.168.0.105 port 56138 ssh2
67:Jun  9 19:46:07 localhost sshd[4982]: Accepted password for testuser1 from 192.168.0.105 port 58182 ssh2
73:Jun  9 19:47:02 localhost sshd[5047]: Accepted password for testuser4 from 192.168.0.105 port 58183 ssh2
79:Jun  9 19:57:26 localhost sshd[5160]: Accepted password for testuser2 from 192.168.0.105 port 58186 ssh2
86:Jun  9 19:58:34 localhost sshd[5231]: Accepted password for testuser1 from 192.168.0.105 port 58187 ssh2
...

我的脚本的后半部分必须读取所有这些通道并创建表,了解每个用户获得多少次成功登录。我的解决方案看起来像这样(删除标题包括严格的警告):

my %SuccessLogins;
my @LoginAttemptsSuccess;
while (my $array = <$fh>) {
    if ($array =~ /Accepted\s+password\s+for\s+(\S+)/) {
      my $counter = () = $array =~ /Accepted\s+password\s+for\s+(\S+)/gi;
      %SuccessLogins = (
        "User"  => $1,
        "Successful"    => $counter
      );
      push (@LoginAttemptsSuccess, \%SuccessLogins);
    }
}

问题是脚本创建的AoH由1个元素组成,在其中我只得到1行。解决方案应该是一个表,其中所有用户都具有相应数量的成功登录:

User = testuser1
Successful = 6

Username = testuser2
Successful = 2

我已经在SO上阅读了很多正则表达式示例,但我仍然没有使用正则表达式计算匹配并存储这些结果的逻辑。

3 个答案:

答案 0 :(得分:4)

我会做类似的事情:

my %SuccessLogins;
while (my $array = <DATA>) {
    if ($array =~ /Accepted\s+password\s+for\s+(\S+)/) {
      $SuccessLogins{$1}++;
    }
}
say Dumper\%SuccessLogins;


__DATA__
2:Jun  9 16:17:14 localhost sshd[3042]: Accepted password for testuser1 from 192.168.0.105 port 56067 ssh2
10:Jun  9 16:31:33 localhost sshd[3176]: Accepted password for testuser1 from 192.168.0.105 port 56136 ssh2
16:Jun  9 16:32:06 localhost sshd[3244]: Accepted password for testuser1 from 192.168.0.105 port 56137 ssh2
24:Jun  9 16:35:26 localhost sshd[3355]: Accepted password for testuser1 from 192.168.0.105 port 56138 ssh2
67:Jun  9 19:46:07 localhost sshd[4982]: Accepted password for testuser1 from 192.168.0.105 port 58182 ssh2
73:Jun  9 19:47:02 localhost sshd[5047]: Accepted password for testuser4 from 192.168.0.105 port 58183 ssh2
79:Jun  9 19:57:26 localhost sshd[5160]: Accepted password for testuser2 from 192.168.0.105 port 58186 ssh2
86:Jun  9 19:58:34 localhost sshd[5231]: Accepted password for testuser1 from 192.168.0.105 port 58187 ssh2

<强>输出:

$VAR1 = {
  'testuser4' => 1,
  'testuser2' => 1,
  'testuser1' => 6
};

答案 1 :(得分:0)

正则表达式的“技巧”是捕获正则表达式创建一个数组。

然后,您可以在标量上下文中评估该数组,以计算出有多少“命中”。

所以:

my $string = "fish fish fish fish fish";

my @array = $string =~ m/(fish)/g;

print "@array\n";

print scalar @array;

这就是它所做的一切。这适用于多线的东西。

这不适用于您的脚本的原因是 - 您正在运行在每一行上运行的while循环。所以你只会匹配你的模式一次,所以你的计数只会是一个。同样 - 您的计数器 - 将与模式的任何匹配,因此不计算您期望的用户登录。

你避免这种情况的方法是:

  • 一次继续工作一行并相应地修改代码。
  • 将您的文件句柄视为一个“块”。

(后者对于真正的大文件来说是一个坏主意)。 这是第一个例子:

use Data::Dumper;

my %count_of;
while ( <DATA> ) {
   my ( $login) = m/Accepted password for (\w+)/;
   print "$login\n"; 
   $count_of{$login}++;
}

print Dumper \%count_of;


__DATA__
2:Jun  9 16:17:14 localhost sshd[3042]: Accepted password for testuser1 from 192.168.0.105 port 56067 ssh2
10:Jun  9 16:31:33 localhost sshd[3176]: Accepted password for testuser1 from 192.168.0.105 port 56136 ssh2
16:Jun  9 16:32:06 localhost sshd[3244]: Accepted password for testuser1 from 192.168.0.105 port 56137 ssh2
24:Jun  9 16:35:26 localhost sshd[3355]: Accepted password for testuser1 from 192.168.0.105 port 56138 ssh2
67:Jun  9 19:46:07 localhost sshd[4982]: Accepted password for testuser1 from 192.168.0.105 port 58182 ssh2
73:Jun  9 19:47:02 localhost sshd[5047]: Accepted password for testuser4 from 192.168.0.105 port 58183 ssh2
79:Jun  9 19:57:26 localhost sshd[5160]: Accepted password for testuser2 from 192.168.0.105 port 58186 ssh2
86:Jun  9 19:58:34 localhost sshd[5231]: Accepted password for testuser1 from 192.168.0.105 port 58187 ssh2

所以第二个:

local $/;
my @logins = <DATA> =~ m/Accepted password for (\w+)/g;
print "@logins";

print scalar @logins;

__DATA__
2:Jun  9 16:17:14 localhost sshd[3042]: Accepted password for testuser1 from 192.168.0.105 port 56067 ssh2
10:Jun  9 16:31:33 localhost sshd[3176]: Accepted password for testuser1 from 192.168.0.105 port 56136 ssh2
16:Jun  9 16:32:06 localhost sshd[3244]: Accepted password for testuser1 from 192.168.0.105 port 56137 ssh2
24:Jun  9 16:35:26 localhost sshd[3355]: Accepted password for testuser1 from 192.168.0.105 port 56138 ssh2
67:Jun  9 19:46:07 localhost sshd[4982]: Accepted password for testuser1 from 192.168.0.105 port 58182 ssh2
73:Jun  9 19:47:02 localhost sshd[5047]: Accepted password for testuser4 from 192.168.0.105 port 58183 ssh2

然后你会像第一个例子中那样减少@logins

但无论如何 - 你可以通过在标量上下文中对数组中的元素进行“计数”,这就是它有用的原因。

当模式匹配时,您还可以使用$1$2等进行绘制 - 再次,这可以用于从列表中提取特定用户,但我更喜欢更直接的分配。

答案 2 :(得分:0)

您的脚本假定正则表达式将同时为&#34; testuser&#34;提取多个值。字符串 - 它不会

哈希构造%SuccessLogins每次在while循环中调用时都会生成一个新的哈希 - 这不是你想要做的,我相信

我将您的测试数据放在文件td1中,然后使用这个内容

perl -ne '@r=/Accepted password for (\w+)/gi; for $item (@r) {$total{$item}++;  } END{  use Data::Dumper; print Dumper(\%total);}' < td1

然后我意识到,在我的测试用例中,一次读取一行我不妨这样做

perl -ne '/Accepted password for (\w+)/gi;  $total{$1}++;  END{  use Data::Dumper; print Dumper(\%total);}' < td1