Question

我正在寻找一种从网页中抓取URL并将其输出到文本文件的方法。例如，如果页面包含多个http://example.com/article，我想抓住这两个URL并将其输出到文本文件。

Answer 1

查看WWW::Mechanize。

示例代码：

use strict;
use warnings;
use 5.010;

use WWW::Mechanize;

my $mech = WWW::Mechanize->new();
$mech->get('http://example.com/example');
foreach my $link ($mech->find_all_links()) {
    say $link->url_abs();
}

Answer 2

使用HTML::SimpleLinkExtor：

use strict;
use warnings;

use HTML::SimpleLinkExtor;

my $extor = HTML::SimpleLinkExtor->new();
$extor->parse_url('http://example.com/article');
my @links = $extor->absolute_links();

寻找一种从页面中抓取URL并将其输出到文本文件的方法

2 个答案: