Question

我的日志文件中有以下条目。

[2016-04-17 10:12:27:682011 GMT] tcp 115.239.248.245:1751 -> 192.168.0.17:8080 52976f9f34d5c286ecf70cac6fba4506 04159c6111bca4f83d7d606a617acc5d6a58328d3a631adf3795f66a5d6265f4d1ec99977a5ae8cb2f3133c9503e5086a5f2ac92be196bb0c9a9f653f9669495 (312 bytes)

我想编写一个脚本来将这一行字符串拆分成片段，以便将这些片段中的一些写入.csv文件中以供机器学习。直到现在我得到这个脚本才能找到某种模式，如果找到它会写下它的内容，硬编码搜索。这不是我想要的。这是我现在的剧本。

#!/usr/bin/perl -w

$path1 = "/home/tsec/testwatch/attackerresult.log";
$attacker = ">>/home/tsec/testwatch/attacker.csv";
#$path2 =
#$path3 =
#$path4 =

#function definition #Pattern for attackerlog only
sub extractor(){
open(LOG, $path1) or die "Cant't open '$path1': $!";
open(FILE, $attacker) or die "Can't open '$attacker': $!";

$target = "tcp";

while(<LOG>){

        if(/$target/){
        print FILE $target . "\n";

        }
}
}
close(LOG);
close(FILE);

我希望CSV文件中的输出是这样的：

我可以手动执行csv标题

<标题>（标题）协议，源IP地址，源端口，文件大小

（来自脚本的字符串结果）tcp，127.0.0.1,8080,312

以上只是一个例子。

有什么想法吗？

Answer 1

如果所有行总是具有相同数量的字段，则可以使用。

use warnings;
use strict;

open my $wfh, '>', 'out.csv' or die $!;

my $cols = "Protocol, Source IP Address, Source Port, File Size\n";
print $wfh $cols;

while (<DATA>){
    if (/
          (?:.*?\s){3}  # get rid of the time
          (.*?)         # capture the proto ($1)
          \s+           # skip the next whitespace    
          (.*?):(\d+)   # separate IP and port, capture both ($2, $3)
          .*?\(         # skip everything until an opening parens
          (\d+)         # capture bytes ($4)
        /x
       ){
        print $wfh "$1, $2, $3, $4\n";
    }
}


__DATA__
2016-04-17 10:12:27:682011 GMT tcp 115.239.248.245:1751 -> 192.168.0.17:8080 52976f9f34d5c286ecf70cac6fba4506 04159c6111bca4f83d7d606a617acc5d6a58328d3a631adf3795f66a5d6265f4d1ec99977a5ae8cb2f3133c9503e5086a5f2ac92be196bb0c9a9f653f9669495 (312 bytes)

输出文件：

Protocol, Source IP Address, Source Port, File Size
tcp, 115.239.248.245, 1751, 312

在文本文件（PERL）中分割日志输入和输出部分的字符串变量

1 个答案: