Question

输入

[security] [client 198.66.91.7] [domain testphp.example.com] [200] [/apache/20160503/20160503-0636/20160503-063628-Vyh-LH8AAAEAAE6zC@AAAAAD] (null)

期望的输出

/apache/20160503/20160503-0636/20160503-063628-Vyh-LH8AAAEAAE6zC@AAAAAD

这是我到目前为止所拥有的

'.*?\[.*?\].*?\[.*?\].*?\[.*?\].*?\[.*?\].*?\[(.*?)\]'

我的Perl代码。

#!/usr/bin/perl
use feature 'say';

$txt='[modsecurity] [client 199.66.91.7] [domain testphp.vulnweb.com] [200] [/apache/20160503/20160503-0636/20160503-063628-Vyh-LH8AAAEAAE6zC@AAAAAD] (null)';


$re=''.*?\[.*?\].*?\[.*?\].*?\[.*?\].*?\[.*?\].*?\[(.*?)\]'';

if ($txt =~ m/$re/is)
{
    $sbraces1=$1;
    say $1; 
}

输出

/apache/20160503/20160503-0636/20160503-063628-Vyh-LH8AAAEAAE6zC@AAAAAD

我认为我的正则表达是凌乱的？也许是另一种方式？

谢谢

Answer 1

我也会使用拆分...或者比你正在使用的更正常的正则表达式：

#!/usr/bin/env perl

use strict;
use warnings;
use Data::Dumper;

my $data = '[security] [client 198.66.91.7] [domain testphp.example.com] [200] [/apache/20160503/20160503-0636/20160503-063628-Vyh-LH8AAAEAAE6zC@AAAAAD] (null)';

my @fields = $data =~ /(?:\[(.*?)\])+/g;

print Dumper(\@fields);

你得到的输出是：

$VAR1 = [
          'security',
          'client 198.66.91.7',
          'domain testphp.example.com',
          '200',                                                                                                                               
          '/apache/20160503/20160503-0636/20160503-063628-Vyh-LH8AAAEAAE6zC@AAAAAD'                                                            
        ];

所以返回数组的第五个元素就是你想要的。

Answer 2

使用字符类否定。因为它的性能优于非贪婪的断言。

my $txt = '[security] [client 198.66.91.7] [domain testphp.example.com] [200] [/apache/20160503/20160503-0636/20160503-063628-Vyh-LH8AAAEAAE6zC@AAAAAD] (null)';

my @array = $txt =~ /\[([^\]]+)\]/g;

print "@array\n";

用于字符类否定的

Here演示。

非贪婪量词的

Here演示。

Answer 3

我创建了这个regex demo：

\[\d{3}\]\s+\[(\S+)\]

我的回答是基于您想要匹配的网址始终跟有HTTP状态代码的假设。

由于它是HTTP状态代码，我们也可以编写（如此SO post）：

\[[1-5][0-9]{2}\]\s+\[(\S+)\]

如何在perl中提取unix路径

3 个答案: