Question

我有这个日志文件

foldl1 (++) [[k, k + 1] | k <- [1, 3..10]]

我想输出这样的结果：

 New connection: 141.8.83.213:64400 (172.17.0.6:2222) [session: e696835c]
    2016-04-29 21:13:59+0000 [SSHService ssh-userauth on HoneyPotTransport,3,141.8.83.213] login attempt [user1/test123] failed
    2016-04-29 21:14:10+0000 [SSHService ssh-userauth on HoneyPotTransport,3,141.8.83.213] login attempt [user1/test1234] failed
    2016-04-29 21:14:13+0000 [SSHService ssh-userauth on HoneyPotTransport,3,141.8.83.213] login attempt [user1/test123] failed

“Occurrences”变量将表示已记录在文件中的登录详细信息[用户名和密码]组合的次数。可以看到Port,Status,Occurrences 64400,failed,2 64400,failed,1从同一IP中记录两次。我怎样才能做到这一点？我现在有两个while循环，并且在第一个while循环中调用子程序，如下所示：

子程序

User1 test123

主要方法

sub counter(){

        $result = 0;
        #open(FILE2, $cowrie) or die "Can't open '$cowrie': $!";
        while(my $otherlines = <LOG2>){

                if($otherlines =~ /login attempt/){
                        ($user, $password) = (split /[\s:\[\]\/]+/, $otherlines)[-3,-2];
                        if($_[1] =~ /$user/ && $_[2] =~ /$password/){
                                $result++;
                        }#if ip matches i think i have to do this with split

                        #print "TEST\n";
                }
        #print "Combo $_[0] and $_[1]\n";

        }
        #print "$result";
        return $result;
}

现在在输出sub cowrieExtractor(){ open(FILE2, $cowrie) or die "Can't open '$cowrie': $!"; open(LOG2, $path2) or die "Can't open '$path2': $!"; $seperator = chr(42); #To output user and password of login attempt, set $ip variable to the contents of array at that x position of new #connection to match the ip of the login attempt print FILE2 "SourcePort"."$seperator". "Status"."$seperator"."Occurences"."$seperator"."Malicious"."\n"; $ip = ""; $port = ""; $usr = ""; $pass = ""; $status = ""; $frequency = 0; #Given this is a user/pass attempt honeypot logger, I will use a wide character to reduce the possibility of stopping #the WEKA CSV loader from functioning by using smileyface as seperators. while(my $lines = <LOG2>){ if($lines =~ /New connection/){ ($ip, $port) = (split /[\[\]\s:()]+/, $lines)[7,8]; } if($lines =~ /login attempt/){#and the ip of the new connection if($lines =~ /$ip/){ ($usr, $pass, $status) = (split /[\s:\[\]\/]+/, $lines)[-3,-2,-1]; $frequency = counter($ip, $usr, $pass); #print $frequency; if($ip && $port && $usr && $pass && $status ne ""){ print FILE2 join "$seperator",($port, $status, $frequency, $end); print FILE2 "\n"; } } } } }下的输出中我得到Occurrences，当我测试时，它似乎来自我在子例程中初始化变量0的内容。即0;意味着子例程中的if语句无法正常工作。有什么帮助吗？

Answer 1

这是获得预期输出的基本方法。关于背景（目的）的问题仍然存在。

use warnings;
use strict;

my $file = 'logfile.txt';
open my $fh_in, '<', $file;

# Assemble results for required output in data structure:
# %rept = { $port => { $usr => { $status => $freq } };

my %rept;
my ($ip, $port);

while (my $line = <$fh_in>) 
{
    if ($line =~ /New connection/) {
        ($ip, $port) = $line =~ /New connection:\s+([^:]+):(\d+)/;
        next;
    }   

    my ($usr, $status) =  $line =~ m/login\ attempt \s+ \[ ( [^\]]+ ) \] \s+ (\w+)/x;
    if ($usr and $status) {
        $rept{$port}{$usr}{$status}++;
    }   
    else { warn "Line with an unexpected format:\n$line" }
}

# use Data::Dumper;
# print Dumper \%rept;

print "Port,Status,Occurences\n";
foreach my $port (sort keys %rept) {
    foreach my $usr (sort keys %{$rept{$port}}) {
        foreach my $stat ( sort keys %{$rept{$port}{$usr}} ) { 
            print "$port,$stat,$rept{$port}{$usr}{$stat}\n"; 
        }   
    }   

}

将您的输入复制到文件logfile.txt中，即可打印

Port,Status,Occurences
64400,failed,2
64400,failed,1

我使用整个user1/test123（等）来识别用户。这可以根据需要在正则表达式中进行更改。请注意，这将不允许您以非常不同的方式查询或组织数据，它主要提取所需输出所需的内容。如果需要解释，请告诉我。

上面使用的嵌套哈希的介绍性说明

首先，我强烈建议您阅读一些可用的材料。一个好的开始肯定是Perl references的标准教程，以及各种食谱在Perl data structures。

用于收集数据的哈希具有端口号的密钥，每个密钥都有其值为哈希引用（或者更确切地说，是匿名哈希）。每一个哈希具有用户的密钥，对于他们的值，哈希引用也是如此。这些的关键是状态的可能值，因此有两个键（失败并成功）。他们的价值观是频率。这种＆＃39;嵌套＆＃39;是一个复杂数据结构。还有一件重要的事情。第一次发表声明可以看到$rept{$port}{$usr}{$status}++创建了整个层次结构。所以关键 $port不需要事先存在。重要的是，这个自动生存 即使仅仅查询结构的值（除非它实际存在），也会发生这种情况的话）。

第一次迭代后，哈希值为

%rept = { '64400' => { 'user1/test123' => { 'failed' => 1 } } }

在第二次迭代中，可以看到相同的端口，但是新用户，因此新数据被添加到第二级匿名哈希。创建新用户的密钥，其值为（新）匿名哈希，status => count。整个哈希是：

%rept = { 
    '64400' => { 
        'user1/test123'  => { 'failed' => 1 },
        'user1/test1234' => { 'failed' => 1 },
    } 
}

在下一次迭代中，可以看到相同的端口，并且已经存在一个用户，并且因为它也存在状态（失败）。因此计算状态增加。

使用例如，可以轻松地看到整个结构 Data::Dumper包。上面代码中注释掉的行会生成

$VAR1 = {
    '64400' => {
        'user1/test123' => {
                                'failed' => 2
                           },
        'user1/test1234' => {
                                'failed' => 1
                            }
                }
        };

当我们保留处理行时，根据需要添加新密钥（端口，用户，状态），其中完整层次结构直到计数（第一次为1），或者，如果遇到现有，则其计数递增。例如，可以如代码中所示遍历和使用所生成的数据结构。有关详细信息，请参阅丰富的文档。

使用perl计算日志文件中的变量组合数

1 个答案: