我在使用以下正则表达式时遇到了一些问题:
(?<=class="Source"><strong>IP Address</strong>:).*(?=</div>)
它似乎捕获得很好,但我无法忽略空白。我尝试过以下操作但不起作用:
(?<=class="Source"><strong>IP Address</strong>:)\S*(?=</div>)
(?<=class="Source"><strong>IP Address</strong>:)[^\s]*(?=</div>)
此外,我不希望捕获重复项。我应该提一下,我正在perl(/ g)进行多/全局搜索
以下文字:
<div style="margin-left: 12em;" class="Source"><strong>Label</strong>: Superman</div>
<div style="margin-left: 12em;" class="Source"><strong>IP Address</strong>: 999.24.135.50</div>
<div style="margin-left: 12em;" class="Source"><strong>IP Address</strong>: 333.24.333.50</div>
<div style="margin-left: 12em;" class="Source"><strong>IP Address</strong>: 333.24.333.50</div>
期望的结果是:
333.24.333.50
999.24.135.50
如果有更简单/更好的方法,请告诉我。
提前致谢。
答案 0 :(得分:1)
我确信这可以缩短或更简单,但这是我的尝试。
使用Mojo::DOM
:
use strict;
use warnings;
use Mojo::DOM;
use List::MoreUtils qw(uniq);
my $dom = Mojo::DOM->new(do {local $/; <DATA>});
my @results;
for my $div ($dom->find('div[class=Source]')
->grep(sub{$_->all_text =~ /IP Address/})
->each) {
push @results, (split /:\s*/, $div->text)[1]
}
my @ips = sort(uniq(@results));
print "$_\n" for @ips;
__DATA__
<html>
<head><title>foo</title></head>
<body>
<div style="margin-left: 12em;" class="Source"><strong>Label</strong>: Superman</div>
<div style="margin-left: 12em;" class="Source"><strong>IP Address</strong>: 999.24.135.50</div>
<div style="margin-left: 12em;" class="Source"><strong>IP Address</strong>: 333.24.333.50</div>
<div style="margin-left: 12em;" class="Source"><strong>IP Address</strong>: 333.24.333.50</div>
</body>
</html>
<强>输出强>
333.24.333.50
999.24.135.50
答案 1 :(得分:-2)
试试这个
my %dup;
while(<DATA>)
{
($ip) = $_ =~/(?<=class="Source"><strong>IP Address<\/strong>:)\s*(.+?)(?=\<\/div\>)/g;
$dup{$ip} = $ip;
}
my @ip_ad = keys %dup;
print @ip_ad;
__DATA__
<div style="margin-left: 12em;" class="Source"><strong>Label</strong>: Superman</div>
<div style="margin-left: 12em;" class="Source"><strong>IP Address</strong>: 999.24.135.50</div>
<div style="margin-left: 12em;" class="Source"><strong>IP Address</strong>: 333.24.333.50</div>
<div style="margin-left: 12em;" class="Source"><strong>IP Address</strong>: 333.24.333.50</div>