尝试使用Mojo :: DOM进行解析,而不是正确使用标记

时间:2016-12-14 17:30:40

标签: perl mojolicious mojo

我使用$ ua从我的$ url =" http://finance.yahoo.com/quote/MSFT?p=MSFT&#34 ;;

中抓取一些HTML

我能够很好地从URL中获取HTML内容。然后我使用Mojo :: DOM进行子解析,这是正确的步骤,对吧?我想进一步从Mojo $ url get()html内容中剥离A HREF ......这就是我所拥有的:

my $ua = Mojo::UserAgent->new( max_redirects => 5, timeout => $timeout );
my $dom = Mojo::DOM->new;

my $content = $ua->get($url)->res->dom->at('div#quoteNewsStream-0-Stream')->content;
my $content2 = $content->$dom->find('a href#');

1 个答案:

答案 0 :(得分:2)

只需使用Mojo::DOM返回的Mojo::UserAgent

#!/usr/bin/env perl

use strict;
use warnings;
use v5.10;

use Mojo::UserAgent;

my $url = "http://finance.yahoo.com/quote/MSFT?p=MSFT";

my $dom = Mojo::UserAgent->new->get($url)->res->dom;

my $stream = $dom->at('div#quoteNewsStream-0-Stream');

for my $href ( $stream->find('a')->each ) {
    say $href->{href};
}

输出:

/news/jeff-bezos-trump-tech-summit-was-very-productive-224326329.html
/news/jeff-bezos-trump-tech-summit-was-very-productive-224326329.html
/news/donald-trump-tech-summit-at-trump-tower-202517070.html
/video/microsoft-surface-sales-surge-disappointment-181934121.html
/news/jeff-bezos-trump-tech-summit-was-very-productive-224326329.html
/news/microsoft-surface-sales-surge-on-disappointment-with-macbook-pro-163819168.html
/news/microsoft-surface-sales-surge-on-disappointment-with-macbook-pro-163819168.html
/m/7f581deb-0089-341a-b637-e1e979e9e210/ss_5-point-checklist-for.html

有关使用这些工具的8分钟教程,请查看Mojocast Episode 5