我的数据是自由形式的文字。我正在扫描该数据的IP地址。我正在使用以下正则表达式:
( # first 3 ip (with optional [.] (.) \{.\} )
(?: (?: 25[0-5] | 2[0-4][0-9] | [01]?[0-9][0-9]? ) \(* \[* \{* \. \)* \]* \}* ){3}
# last octet
(?: 25[0-5] | 2[0-4][0-9] | [01]?[0-9][0-9]? )
)
这很有效,直到文本中包含以下内容:
1.3.6.1.2.1.1.1.0: blah, blah, blah
然后我得到以下比赛:
1.
3.6.1.2
1.1.1.0
我需要对正则表达式进行什么样的修改? Perl RE如果重要的话。
示例数据:
This is the IP I want 10.12.1.23, but when I did the snmp walk the 1.3.6.1.2.1.1.1.0 variable came back null.
期望捕获:
10.12.1.23
答案 0 :(得分:1)
此示例确保匹配的字符串前面或后面没有点.
或十进制数字。
我不明白括号的含义,所以我没有添加代码。
好的,我现在得到了括号。我为分隔符添加了另一个正则表达式,并将其包含在最终的正则表达式中。似乎工作正常。
use strict;
use warnings;
my $s = <<END_TEXT;
This is the IP I want 10.12.1.23, but when I did the
snmp walk the 1.3.6.1.2.1.1.1.0 variable came back null.
END_TEXT
my $octet_re = qr/(?: 25[0-5] | 2[0-4][0-9] | [01]?[0-9]?[0-9] )/x;
my $separator_re = qr/(?: \. | \Q(.)\E | \Q[.]\E | \Q{.}\E )/x;
my $ip_re = qr/(?: (?: $octet_re $separator_re ){3} $octet_re )/x;
print $1, "\n" while $s =~ /(?<! [0-9.] ) ($ip_re) (?! [0-9.] )/xg;
<强>输出强>
10.12.1.23
答案 1 :(得分:1)
正则表达式:(?<!\.)\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}(?!\.)
它捕获4组数字(1到3),前面没有点或后跟一个点。
编辑:是显示前瞻和后瞻,用原始正则表达式替换\ d {1,3}以匹配特定IP,如果确实需要
答案 2 :(得分:0)
使用负前瞻和后瞻断言以避免被其他句点所包围:
while (<DATA>) {
while (
m/(
(?<!\.)\b
(?:
(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\b\.?
){4}
\b(?<!\.)(?!\.)
)/xg
)
{
print "$1\n";
}
}
__DATA__
This is the IP I want 10.12.1.23, but when I did the
snmp walk the 1.3.6.1.2.1.1.1.0 variable came back null
输出:
10.12.1.23
答案 3 :(得分:0)
这是我经常使用的IP地址提取REGEX。它涵盖了有效IP地址匹配的所有基础:
/(?<![\d\.])((2[0-4][0-9]\.|25[0-5]\.|[01]\d\d\.|\d\d\.|\d\.){3}(2[0-4][0-9]|25[0-5]|[01]\d\d|\d\d|\d))(?![\d\.])/
它将匹配四个不同的八位字节,以句点分隔。每个八位字节包含的数字必须与0-255相匹配,但也可以使用1或2个零(01,01,001,011)作为前缀。
如果地址前面带有或后跟数字或句点,则不匹配。
这是一个可以用来测试它的脚本:
#!/usr/bin/perl
while (<>) {
print &checkIP ($_) . "\n";
}
# Bullet-proof IP Address extraction:
# This will look for IP addresses, each octet matching a range of 0-255, with
# contingencies for preceding zeroes (01, 001 ..These should be considered valid).
# Additionally, it ensures that it is not preceded or followed by a period
# or a digit (making the element 4 digits).
# The caveat is that this will only match the first IP address on a given line.
# This can be remedied by adding /g and capturing the matches in an array.
sub checkIP {
my $l = shift;
my $out;
return $1 if ($l =~ /(?<![\d\.])((2[0-4][0-9]\.|25[0-5]\.|[01]\d\d\.|\d\d\.|\d\.){3}(2[0-4] [0-9]|25[0-5]|[01]\d\d|\d\d|\d))(?![\d\.])/);
chomp ($l);
return "INVALID: " . $l;
}
以下是REGEX的细分,以防有人感兴趣:
# DO NOT MATCH ON supposed IP addresses preceded by a digit or a period.
/(?<![\d\.])
# FIRST THREE OCTETS MATCH:
# MATCH ON:
# 3-digit numbers beginning with 2, but only up to 249 preceding a period
# OR 3-digit numbers beginning with 25, but only 250-255 preceding a period
# OR 3-digit numbers beginning with 0 or 1 preceding a period
# OR 2-digit numbers preceding a period
# OR 1 digit numbers preceding a period
# MATCH EXACTLY 3 TIMES.
((2[0-4][0-9]\.|25[0-5]\.|[01]\d\d\.|\d\d\.|\d\.){3}
# FINAL OCTET MATCH:
# 3-digit numbers beginning with 2, but only up to 249
# OR 3-digit numbers beginning with 25, but only 250-255
# OR 3-digit numbers beginning with 0 or 1
# OR 2-digit numbers
# OR 1 digit numbers
(2[0-4][0-9]|25[0-5]|[01]\d\d|\d\d|\d))
# DO NOT MATCH if the next character is a digit or a period:
(?![\d\.])/
答案 4 :(得分:-1)
如果要排除包含太多“八位字节”的数字:##。##。##。##。##,您可以使用环视排除这些部分。您还可以使用\b
等字词边界来阻止 99 127.0.0.1
被捕获:
(
(?<!\d\.) # Ensure octet has no preceding '##.'
\b # Word boundary to exclude ##127.
# first 3 ip (with optional [.] (.) \{.\} )
(?:
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
\(*\[*\{*\.\)*\]*\}*
){3}
# last octet
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
\b # Word boundary to exclude .255##
(?!\.\d) # Ensure octet has no trailing '.##'
)
您可以看到示例here。
在那里扔一些不同的文字,检查它是否能让你得到你期望的结果。