我使用$ ua从我的$ url =" http://finance.yahoo.com/quote/MSFT?p=MSFT&#34 ;;
中抓取一些HTML我能够很好地从URL中获取HTML内容。然后我使用Mojo :: DOM进行子解析,这是正确的步骤,对吧?我想进一步从Mojo $ url get()html内容中剥离A HREF ......这就是我所拥有的:
my $ua = Mojo::UserAgent->new( max_redirects => 5, timeout => $timeout );
my $dom = Mojo::DOM->new;
my $content = $ua->get($url)->res->dom->at('div#quoteNewsStream-0-Stream')->content;
my $content2 = $content->$dom->find('a href#');
答案 0 :(得分:2)
只需使用Mojo::DOM返回的Mojo::UserAgent:
#!/usr/bin/env perl
use strict;
use warnings;
use v5.10;
use Mojo::UserAgent;
my $url = "http://finance.yahoo.com/quote/MSFT?p=MSFT";
my $dom = Mojo::UserAgent->new->get($url)->res->dom;
my $stream = $dom->at('div#quoteNewsStream-0-Stream');
for my $href ( $stream->find('a')->each ) {
say $href->{href};
}
输出:
/news/jeff-bezos-trump-tech-summit-was-very-productive-224326329.html
/news/jeff-bezos-trump-tech-summit-was-very-productive-224326329.html
/news/donald-trump-tech-summit-at-trump-tower-202517070.html
/video/microsoft-surface-sales-surge-disappointment-181934121.html
/news/jeff-bezos-trump-tech-summit-was-very-productive-224326329.html
/news/microsoft-surface-sales-surge-on-disappointment-with-macbook-pro-163819168.html
/news/microsoft-surface-sales-surge-on-disappointment-with-macbook-pro-163819168.html
/m/7f581deb-0089-341a-b637-e1e979e9e210/ss_5-point-checklist-for.html
有关使用这些工具的8分钟教程,请查看Mojocast Episode 5