Question

有没有办法通过使用Mechanize来查找特定div中的所有链接？

我尝试使用find_all_links但无法找到解决此问题的方法。例如，

<div class="sometag">
<ul class"tags">
<li><a href="/a.html">A</a></li>
<li><a href="/b.html">B</a></li> 
</ul>
</div>

Answer 1

从HTML文件中获取有用信息的有用工具是HTML::Grabber。它使用jQuery样式的语法来引用HTML中的元素，因此您可以执行以下操作：

use HTML::Grabber;

# Your mechanize stuff here ...

my $dom = HTML::Grabber->new( html => $mech->content );

my @links;
$dom->find('div.sometag a')->each(sub {
    push @links, $_->attr('href');
});

Answer 2

Web :: Scraper对于抓取非常有用。

use strict;
use warnings;
use WWW::Mechanize;
use Web::Scraper;

my $mech = WWW::Mechanize->new;
$mech->env_proxy;
# If you want to login, do it with mechanize.

my $staff = scrape { process 'div.sometag li.tags a', 'links[]' => '@href' };
# pass mechanize to scraper as useragent.
$staff->user_agent($mech);

my $res = $staff->scrape( URI->new("http://example.com/") );
for my $link (@{$res->{links}}) {
    warn $link;
}

抱歉，我没有测试此代码。

Perl Mechanize查找Div内的所有链接

2 个答案: