Question

我的数据是自由形式的文字。我正在扫描该数据的IP地址。我正在使用以下正则表达式：

(   # first 3 ip (with optional [.] (.) \{.\} )
    (?: (?: 25[0-5] | 2[0-4][0-9] | [01]?[0-9][0-9]? ) \(* \[* \{* \. \)* \]* \}* ){3}
    # last octet
    (?: 25[0-5] | 2[0-4][0-9] | [01]?[0-9][0-9]? )
)

这很有效，直到文本中包含以下内容：

1.3.6.1.2.1.1.1.0: blah, blah, blah

然后我得到以下比赛：

1.
3.6.1.2
1.1.1.0

我需要对正则表达式进行什么样的修改？ Perl RE如果重要的话。

示例数据：

This is the IP I want 10.12.1.23, but when I did the snmp walk the 1.3.6.1.2.1.1.1.0 variable came back null.

期望捕获：

10.12.1.23

Answer 1

此示例确保匹配的字符串前面或后面没有点.或十进制数字。

~~我不明白括号的含义，所以我没有添加代码。~~

好的，我现在得到了括号。我为分隔符添加了另一个正则表达式，并将其包含在最终的正则表达式中。似乎工作正常。

use strict;
use warnings;

my $s = <<END_TEXT;
This is the IP I want 10.12.1.23, but when I did the
snmp walk the 1.3.6.1.2.1.1.1.0 variable came back null.
END_TEXT

my $octet_re     = qr/(?: 25[0-5] | 2[0-4][0-9] | [01]?[0-9]?[0-9] )/x;
my $separator_re = qr/(?: \. | \Q(.)\E | \Q[.]\E | \Q{.}\E )/x;
my $ip_re        = qr/(?: (?: $octet_re  $separator_re ){3} $octet_re )/x;

print $1, "\n" while $s =~ /(?<! [0-9.] ) ($ip_re) (?! [0-9.] )/xg;

<强>输出

10.12.1.23

Answer 2

Demo

正则表达式：(?<!\.)\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}(?!\.)

它捕获4组数字（1到3），前面没有点或后跟一个点。

编辑：是显示前瞻和后瞻，用原始正则表达式替换\ d {1,3}以匹配特定IP，如果确实需要

Answer 3

使用负前瞻和后瞻断言以避免被其他句点所包围：

while (<DATA>) {
    while (
        m/( 
            (?<!\.)\b
            (?:
                (?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\b\.?
            ){4}
            \b(?<!\.)(?!\.)
        )/xg
        )
    {
        print "$1\n";
    }
}

__DATA__
This is the IP I want 10.12.1.23, but when I did the
snmp walk the 1.3.6.1.2.1.1.1.0 variable came back null

输出：

10.12.1.23

Answer 4

这是我经常使用的IP地址提取REGEX。它涵盖了有效IP地址匹配的所有基础：

/(?<![\d\.])((2[0-4][0-9]\.|25[0-5]\.|[01]\d\d\.|\d\d\.|\d\.){3}(2[0-4][0-9]|25[0-5]|[01]\d\d|\d\d|\d))(?![\d\.])/

它将匹配四个不同的八位字节，以句点分隔。每个八位字节包含的数字必须与0-255相匹配，但也可以使用1或2个零（01,01,001,011）作为前缀。

如果地址前面带有或后跟数字或句点，则不匹配。

这是一个可以用来测试它的脚本：

#!/usr/bin/perl
while (<>) {
    print &checkIP ($_) . "\n";
}

# Bullet-proof IP Address extraction:
# This will look for IP addresses, each octet matching a range of 0-255, with
# contingencies for preceding zeroes (01, 001 ..These should be considered valid).
# Additionally, it ensures that it is not preceded or followed by a period
# or a digit (making the element 4 digits).
# The caveat is that this will only match the first IP address on a given line.
# This can be remedied by adding /g and capturing the matches in an array.
sub checkIP {
    my $l  = shift;
    my $out;
    return $1 if ($l =~ /(?<![\d\.])((2[0-4][0-9]\.|25[0-5]\.|[01]\d\d\.|\d\d\.|\d\.){3}(2[0-4] [0-9]|25[0-5]|[01]\d\d|\d\d|\d))(?![\d\.])/);
    chomp ($l);
    return "INVALID: " . $l;
}

以下是REGEX的细分，以防有人感兴趣：

# DO NOT MATCH ON supposed IP addresses preceded by a digit or a period.
/(?<![\d\.])
# FIRST THREE OCTETS MATCH:
# MATCH ON:
#   3-digit numbers beginning with 2, but only up to 249 preceding a period
#   OR 3-digit numbers beginning with 25, but only 250-255 preceding a period
#   OR 3-digit numbers beginning with 0 or 1 preceding a period
#   OR 2-digit numbers preceding a period
#   OR 1 digit numbers preceding a period
#   MATCH EXACTLY 3 TIMES.
((2[0-4][0-9]\.|25[0-5]\.|[01]\d\d\.|\d\d\.|\d\.){3}
# FINAL OCTET MATCH:
#   3-digit numbers beginning with 2, but only up to 249
#   OR 3-digit numbers beginning with 25, but only 250-255
#   OR 3-digit numbers beginning with 0 or 1
#   OR 2-digit numbers
#   OR 1 digit numbers
(2[0-4][0-9]|25[0-5]|[01]\d\d|\d\d|\d))
# DO NOT MATCH if the next character is a digit or a period:
(?![\d\.])/

Answer 5

如果要排除包含太多“八位字节”的数字：##。##。##。##。##，您可以使用环视排除这些部分。您还可以使用\b等字词边界来阻止 99 127.0.0.1 被捕获：

(
  (?<!\d\.) # Ensure octet has no preceding '##.'
  \b        # Word boundary to exclude ##127.

  # first 3 ip (with optional [.] (.) \{.\} )
  (?:
    (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
    \(*\[*\{*\.\)*\]*\}*
  ){3}

  # last octet
  (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)

  \b        # Word boundary to exclude .255##
  (?!\.\d)  # Ensure octet has no trailing '.##'
)

您可以看到示例here。

在那里扔一些不同的文字，检查它是否能让你得到你期望的结果。

正则表达式捕获IPaddr但不捕获MIB

5 个答案: