$ str1 =“ssh_2-4 ^接受IN = ETH2 OUT = eth33 MAC = 00:d0:c9:96:62:c0:00:1c:f0:98:19:57:08:00 SRC = 192.168.200.30 DST = 192.168.200.224 LEN = 48 TOS = 0x00 PREC = 0x00 TTL = 128 ID = 30546 DF PROTO = TCP SPT = 10159 DPT = 4319 WINDOW = 7300 RES = 0x00 SYN URGP = 0“;
$ str2 =“ssh_2-4 ^接受IN = ETH2 OUT = eth33 MAC = 00:d0:c9:96:62:c0:00:1c:f0:98:19:57:08:00 SRC = 192.168.200.30 DST = 192.168.200.224 LEN = 48 TOS = 0x00 PREC = 0x00 TTL = 128 ID = 30546 DF PROTO = ICMP WINDOW = 7300 RES = 0x00 URGP = 0“;
我需要捕获:
for $ str1 ==> ssh_2-4,接受,ETH @,eth33,192.168.200.30,192.168.200.224,TCP,10159,4319
for $ str2 ==> ssh_2-4,接受,ETH @,eth33,192.168.200.30,192.168.200.224,ICMP
我使用下面的regexp并且非常适合$ str1,但是不能使用$ str2:
(\w*)\^(\w*).*IN=(\S*).*OUT=(\S*).*SRC=(\S* ).*DST=(\S*).*PROTO=(\S*).*SPT=(\d*).*DPT=(\d*).*
为此目的适用的正则表达式是什么?
答案 0 :(得分:2)
分裂对我来说似乎更加强大和干净。例如:
$str2=~ /^(.*?)\^(\w*)\s+(.*)$/;
my($version,$action,$args) = ($1,$2,$3);
my %argsmap = split(/[= ]/, $args);
print "proto=$argsmap{'PROTO'} \n";
编辑:我错误地认为每个“字段”都有key=value
对。修正版:
my(@args) = split(/ /,$str2);
my($version,$action) = split(/\^/,shift @args);
my %argsmap = map { $_ =~ /(.*)=(.*)/ ? ($1,$2) : ($_,'') } @args;
答案 1 :(得分:0)
$str1="ssh_2-4^accept IN=ETH2 OUT=eth33 MAC=00:d0:c9:96:62:c0:00:1c:f0:98:19:57:08:00 SRC=192.168.200.30 DST=192.168.200.224 LEN=48 TOS=0x00 PREC=0x00 TTL=128 ID=30546 DF PROTO=TCP SPT=10159 DPT=4319 WINDOW=7300 RES=0x00 SYN URGP=0";
$str2="ssh_2-4^accept IN=ETH2 OUT=eth33 MAC=00:d0:c9:96:62:c0:00:1c:f0:98:19:57:08:00 SRC=192.168.200.30 DST=192.168.200.224 LEN=48 TOS=0x00 PREC=0x00 TTL=128 ID=30546 DF PROTO=ICMP WINDOW=7300 RES=0x00 URGP=0";
foreach my $i ($str1, $str2) {
if ($i =~ /^(.+)\^(\w+)\s+IN=(\S+)\s+OUT=(\S+).*?SRC=(\S+)\s+DST=(\S+).*?PROTO=(\S+)(?:.*?SPT=(\d+)\s+DPT=(\d+))?/) {
print "/1=$1/2=$2/3=$3/4=$4/5=$5/6=$6/7=$7/8=$8/9=$9\n";
}
}
这给出了
/1=ssh_2-4/2=accept/3=ETH2/4=eth33/5=192.168.200.30/6=192.168.200.224/7=TCP/8=10159/9=4319
/1=ssh_2-4/2=accept/3=ETH2/4=eth33/5=192.168.200.30/6=192.168.200.224/7=ICMP/8=/9=
在可选的子括号中捕获SPT和DPT部件:(?:.*?SPT=(\d+)\s+DPT=(\d+))?
答案 2 :(得分:0)
贪婪量词意味着每次表达式匹配时,它都会将.*
与行中所有其余字符匹配。这意味着它匹配的每个时间必须消耗输入,找不到下一个表达式,然后回溯直到它。这高度效率低下。
相反,您想使用非贪婪的形式:.*?
。然后为了确保你得到整个单词/键,你可以使用word-break说明符:\b
,如下所示:
my $re
= qr/
([\w-]*) \^ (\w*) .*?
\bIN=(\S*) .*?
\bOUT=(\S*) .*?
\bSRC=(\S*) .*?
\bDST=(\S*) .*?
\bPROTO=(\S*)
(?: .*?
\bSPT=(\d*)
.*?
\bDPT=(\d*)
)?
/x;
现在,由于每行中没有SPT和DPT字段,因此您希望使该匹配成为条件(?:...)?
这就是我需要做的一切:
while ( <$data> ) {
my @flds = m/$re/;
print join( ',', grep { defined and length } @flds ), "\n";
}
答案 3 :(得分:0)
基于leonbloy's答案的更加丰富的拆分版本。由于奇数个元素,直接拆分不起作用。因此,我们明确地在=
上进行拆分,并允许未定义空值以保留散列键/值对。
<强>代码:强>
use strict;
use warnings;
my $str1="ssh_2-4^accept IN=ETH2 OUT=eth33 MAC=00:d0:c9:96:62:c0:00:1c:f0:98:19:57:08:00 SRC=192.168.200.30 DST=192.168.200.224 LEN=48 TOS=0x00 PREC=0x00 TTL=128 ID=30546 DF PROTO=TCP SPT=10159 DPT=4319 WINDOW=7300 RES=0x00 SYN URGP=0";
my $str2="ssh_2-4^accept IN=ETH2 OUT=eth33 MAC=00:d0:c9:96:62:c0:00:1c:f0:98:19:57:08:00 SRC=192.168.200.30 DST=192.168.200.224 LEN=48 TOS=0x00 PREC=0x00 TTL=128 ID=30546 DF PROTO=ICMP WINDOW=7300 RES=0x00 URGP=0";
my @data;
for my $str ($str1, $str2) {
my %hash;
# First we extract the "header"
$str =~ s/^([^^]+)\^(\w+) // || die "Did not match header";
$hash{'version'} = $1;
$hash{'action'} = $2;
# Now process the args
for my $line (split ' ', $str) {
my ($key, $val) = split /=/, $line;
$hash{$key} = $val;
}
# Save the hash into an array
push @data, \%hash;
}
for my $href (@data) {
# Now output the selected elements from each hash
my $out = join ", ",
@$href{'version','action','IN','OUT','SRC','DST','PROTO'};
if ($href->{'PROTO'} eq 'TCP') {
$out = join ", ", $out, @$href{'SPT', 'DPT'};
}
print "$out\n";
}
<强>输出:强>
ssh_2-4, accept, ETH2, eth33, 192.168.200.30, 192.168.200.224, TCP, 10159, 4319
ssh_2-4, accept, ETH2, eth33, 192.168.200.30, 192.168.200.224, ICMP