正则表达式(perl like)

时间:2011-10-31 12:09:09

标签: regex string perl

  

$ str1 =“ssh_2-4 ^接受IN = ETH2 OUT = eth33   MAC = 00:d0:c9:96:62:c0:00:1c:f0:98:19:57:08:00 SRC = 192.168.200.30   DST = 192.168.200.224 LEN = 48 TOS = 0x00 PREC = 0x00 TTL = 128 ID = 30546 DF   PROTO = TCP SPT = 10159 DPT = 4319 WINDOW = 7300 RES = 0x00 SYN URGP = 0“;

     

$ str2 =“ssh_2-4 ^接受IN = ETH2 OUT = eth33   MAC = 00:d0:c9:96:62:c0:00:1c:f0:98:19:57:08:00 SRC = 192.168.200.30   DST = 192.168.200.224 LEN = 48 TOS = 0x00 PREC = 0x00 TTL = 128 ID = 30546 DF   PROTO = ICMP WINDOW = 7300 RES = 0x00 URGP = 0“;

我需要捕获:

for $ str1 ==> ssh_2-4,接受,ETH @,eth33,192.168.200.30,192.168.200.224,TCP,10159,4319

for $ str2 ==> ssh_2-4,接受,ETH @,eth33,192.168.200.30,192.168.200.224,ICMP

我使用下面的regexp并且非常适合$ str1,但是不能使用$ str2:

(\w*)\^(\w*).*IN=(\S*).*OUT=(\S*).*SRC=(\S* ).*DST=(\S*).*PROTO=(\S*).*SPT=(\d*).*DPT=(\d*).*

为此目的适用的正则表达式是什么?

4 个答案:

答案 0 :(得分:2)

分裂对我来说似乎更加强大和干净。例如:

$str2=~  /^(.*?)\^(\w*)\s+(.*)$/;
my($version,$action,$args) = ($1,$2,$3);
my %argsmap =  split(/[= ]/, $args);
print "proto=$argsmap{'PROTO'} \n";

编辑:我错误地认为每个“字段”都有key=value对。修正版:

  my(@args) = split(/ /,$str2);
  my($version,$action) = split(/\^/,shift @args);
  my %argsmap = map { $_ =~ /(.*)=(.*)/ ? ($1,$2) : ($_,'') } @args;

答案 1 :(得分:0)

$str1="ssh_2-4^accept IN=ETH2 OUT=eth33 MAC=00:d0:c9:96:62:c0:00:1c:f0:98:19:57:08:00 SRC=192.168.200.30 DST=192.168.200.224 LEN=48 TOS=0x00 PREC=0x00 TTL=128 ID=30546 DF PROTO=TCP SPT=10159 DPT=4319 WINDOW=7300 RES=0x00 SYN URGP=0";
$str2="ssh_2-4^accept IN=ETH2 OUT=eth33 MAC=00:d0:c9:96:62:c0:00:1c:f0:98:19:57:08:00 SRC=192.168.200.30 DST=192.168.200.224 LEN=48 TOS=0x00 PREC=0x00 TTL=128 ID=30546 DF PROTO=ICMP WINDOW=7300 RES=0x00 URGP=0";

foreach my $i ($str1, $str2) {
    if ($i =~ /^(.+)\^(\w+)\s+IN=(\S+)\s+OUT=(\S+).*?SRC=(\S+)\s+DST=(\S+).*?PROTO=(\S+)(?:.*?SPT=(\d+)\s+DPT=(\d+))?/) {
        print "/1=$1/2=$2/3=$3/4=$4/5=$5/6=$6/7=$7/8=$8/9=$9\n";
    }
}

这给出了

/1=ssh_2-4/2=accept/3=ETH2/4=eth33/5=192.168.200.30/6=192.168.200.224/7=TCP/8=10159/9=4319
/1=ssh_2-4/2=accept/3=ETH2/4=eth33/5=192.168.200.30/6=192.168.200.224/7=ICMP/8=/9=

在可选的子括号中捕获SPT和DPT部件:(?:.*?SPT=(\d+)\s+DPT=(\d+))?

答案 2 :(得分:0)

贪婪量词意味着每次表达式匹配时,它都会将.*与行中所有其余字符匹配。这意味着它匹配的每个时间必须消耗输入,找不到下一个表达式,然后回溯直到它。这高度效率低下。

相反,您想使用非贪婪的形式:.*?。然后为了确保你得到整个单词/键,你可以使用word-break说明符:\b,如下所示:

my $re 
    = qr/
        ([\w-]*) \^ (\w*) .*? 
        \bIN=(\S*)  .*?
        \bOUT=(\S*) .*?
        \bSRC=(\S*) .*?
        \bDST=(\S*) .*?
        \bPROTO=(\S*)
        (?: .*? 
            \bSPT=(\d*) 
            .*?
            \bDPT=(\d*)
        )?
    /x;

现在,由于每行中没有SPT和DPT字段,因此您希望使该匹配成为条件(?:...)?

这就是我需要做的一切:

while ( <$data> ) {
    my @flds = m/$re/;
    print join( ',', grep { defined and length } @flds ), "\n"; 
}

答案 3 :(得分:0)

基于leonbloy's答案的更加丰富的拆分版本。由于奇数个元素,直接拆分不起作用。因此,我们明确地在=上进行拆分,并允许未定义空值以保留散列键/值对。

<强>代码:

use strict;
use warnings;

my $str1="ssh_2-4^accept IN=ETH2 OUT=eth33 MAC=00:d0:c9:96:62:c0:00:1c:f0:98:19:57:08:00 SRC=192.168.200.30 DST=192.168.200.224 LEN=48 TOS=0x00 PREC=0x00 TTL=128 ID=30546 DF PROTO=TCP SPT=10159 DPT=4319 WINDOW=7300 RES=0x00 SYN URGP=0";
my $str2="ssh_2-4^accept IN=ETH2 OUT=eth33 MAC=00:d0:c9:96:62:c0:00:1c:f0:98:19:57:08:00 SRC=192.168.200.30 DST=192.168.200.224 LEN=48 TOS=0x00 PREC=0x00 TTL=128 ID=30546 DF PROTO=ICMP WINDOW=7300 RES=0x00 URGP=0";

my @data;
for my $str ($str1, $str2) {
    my %hash;
    # First we extract the "header"
    $str =~ s/^([^^]+)\^(\w+) // || die "Did not match header";
    $hash{'version'} = $1;
    $hash{'action'} = $2;

    # Now process the args
    for my $line (split ' ', $str) {
        my ($key, $val) = split /=/, $line;
        $hash{$key} = $val;
    }
    # Save the hash into an array
    push @data, \%hash;
}

for my $href (@data) {
    # Now output the selected elements from each hash
    my $out = join ", ",
        @$href{'version','action','IN','OUT','SRC','DST','PROTO'};
    if ($href->{'PROTO'} eq 'TCP') {
        $out = join ", ", $out, @$href{'SPT', 'DPT'};
    }
    print "$out\n";
}

<强>输出:

ssh_2-4, accept, ETH2, eth33, 192.168.200.30, 192.168.200.224, TCP, 10159, 4319
ssh_2-4, accept, ETH2, eth33, 192.168.200.30, 192.168.200.224, ICMP