正则表达式捕获IPaddr但不捕获MIB

时间:2014-08-26 20:27:01

标签: regex perl

我的数据是自由形式的文字。我正在扫描该数据的IP地址。我正在使用以下正则表达式:

(   # first 3 ip (with optional [.] (.) \{.\} )
    (?: (?: 25[0-5] | 2[0-4][0-9] | [01]?[0-9][0-9]? ) \(* \[* \{* \. \)* \]* \}* ){3}
    # last octet
    (?: 25[0-5] | 2[0-4][0-9] | [01]?[0-9][0-9]? )
)

这很有效,直到文本中包含以下内容:

1.3.6.1.2.1.1.1.0: blah, blah, blah

然后我得到以下比赛:

1.
3.6.1.2
1.1.1.0

我需要对正则表达式进行什么样的修改? Perl RE如果重要的话。

示例数据:

This is the IP I want 10.12.1.23, but when I did the snmp walk the 1.3.6.1.2.1.1.1.0 variable came back null.

期望捕获:

10.12.1.23

5 个答案:

答案 0 :(得分:1)

此示例确保匹配的字符串前面或后面没有点.或十进制数字。

我不明白括号的含义,所以我没有添加代码。

好的,我现在得到了括号。我为分隔符添加了另一个正则表达式,并将其包含在最终的正则表达式中。似乎工作正常。

use strict;
use warnings;

my $s = <<END_TEXT;
This is the IP I want 10.12.1.23, but when I did the
snmp walk the 1.3.6.1.2.1.1.1.0 variable came back null.
END_TEXT

my $octet_re     = qr/(?: 25[0-5] | 2[0-4][0-9] | [01]?[0-9]?[0-9] )/x;
my $separator_re = qr/(?: \. | \Q(.)\E | \Q[.]\E | \Q{.}\E )/x;
my $ip_re        = qr/(?: (?: $octet_re  $separator_re ){3} $octet_re )/x;

print $1, "\n" while $s =~ /(?<! [0-9.] ) ($ip_re) (?! [0-9.] )/xg;

<强>输出

10.12.1.23

答案 1 :(得分:1)

Demo

正则表达式:(?<!\.)\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}(?!\.)

它捕获4组数字(1到3),前面没有点或后跟一个点。

编辑:是显示前瞻和后瞻,用原始正则表达式替换\ d {1,3}以匹配特定IP,如果确实需要

答案 2 :(得分:0)

使用负前瞻和后瞻断言以避免被其他句点所包围:

while (<DATA>) {
    while (
        m/( 
            (?<!\.)\b
            (?:
                (?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\b\.?
            ){4}
            \b(?<!\.)(?!\.)
        )/xg
        )
    {
        print "$1\n";
    }
}

__DATA__
This is the IP I want 10.12.1.23, but when I did the
snmp walk the 1.3.6.1.2.1.1.1.0 variable came back null

输出:

10.12.1.23

答案 3 :(得分:0)

这是我经常使用的IP地址提取REGEX。它涵盖了有效IP地址匹配的所有基础:

/(?<![\d\.])((2[0-4][0-9]\.|25[0-5]\.|[01]\d\d\.|\d\d\.|\d\.){3}(2[0-4][0-9]|25[0-5]|[01]\d\d|\d\d|\d))(?![\d\.])/

它将匹配四个不同的八位字节,以句点分隔。每个八位字节包含的数字必须与0-255相匹配,但也可以使用1或2个零(01,01,001,011)作为前缀。

如果地址前面带有或后跟数字或句点,则不匹配。

这是一个可以用来测试它的脚本:

#!/usr/bin/perl
while (<>) {
    print &checkIP ($_) . "\n";
}

# Bullet-proof IP Address extraction:
# This will look for IP addresses, each octet matching a range of 0-255, with
# contingencies for preceding zeroes (01, 001 ..These should be considered valid).
# Additionally, it ensures that it is not preceded or followed by a period
# or a digit (making the element 4 digits).
# The caveat is that this will only match the first IP address on a given line.
# This can be remedied by adding /g and capturing the matches in an array.
sub checkIP {
    my $l  = shift;
    my $out;
    return $1 if ($l =~ /(?<![\d\.])((2[0-4][0-9]\.|25[0-5]\.|[01]\d\d\.|\d\d\.|\d\.){3}(2[0-4] [0-9]|25[0-5]|[01]\d\d|\d\d|\d))(?![\d\.])/);
    chomp ($l);
    return "INVALID: " . $l;
}

以下是REGEX的细分,以防有人感兴趣:

# DO NOT MATCH ON supposed IP addresses preceded by a digit or a period.
/(?<![\d\.])
# FIRST THREE OCTETS MATCH:
# MATCH ON:
#   3-digit numbers beginning with 2, but only up to 249 preceding a period
#   OR 3-digit numbers beginning with 25, but only 250-255 preceding a period
#   OR 3-digit numbers beginning with 0 or 1 preceding a period
#   OR 2-digit numbers preceding a period
#   OR 1 digit numbers preceding a period
#   MATCH EXACTLY 3 TIMES.
((2[0-4][0-9]\.|25[0-5]\.|[01]\d\d\.|\d\d\.|\d\.){3}
# FINAL OCTET MATCH:
#   3-digit numbers beginning with 2, but only up to 249
#   OR 3-digit numbers beginning with 25, but only 250-255
#   OR 3-digit numbers beginning with 0 or 1
#   OR 2-digit numbers
#   OR 1 digit numbers
(2[0-4][0-9]|25[0-5]|[01]\d\d|\d\d|\d))
# DO NOT MATCH if the next character is a digit or a period:
(?![\d\.])/

答案 4 :(得分:-1)

如果要排除包含太多“八位字节”的数字:##。##。##。##。##,您可以使用环视排除这些部分。您还可以使用\b等字词边界来阻止 99 127.0.0.1 被捕获:

(
  (?<!\d\.) # Ensure octet has no preceding '##.'
  \b        # Word boundary to exclude ##127.

  # first 3 ip (with optional [.] (.) \{.\} )
  (?:
    (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
    \(*\[*\{*\.\)*\]*\}*
  ){3}

  # last octet
  (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)

  \b        # Word boundary to exclude .255##
  (?!\.\d)  # Ensure octet has no trailing '.##'
)

您可以看到示例here

在那里扔一些不同的文字,检查它是否能让你得到你期望的结果。