解析嵌套元素HTML :: TreeBuilder

时间:2016-04-19 14:53:17

标签: perl

我有一个示例html:

<div>
    <p>get this</p>
</div>
<p>not this</p>

有没有办法使用HTML::TreeBuilderlook_down来获取嵌套元素?我可以在第一次搜索的结果元素上使用look_down

my $tree = HTML::TreeBuilder->new;
$tree->parse("<div><p>get this</p></div><p>not this</p>");
my $div = $tree->look_down(_tag => "div");
my $p = $div->look_down(_tag => "p");
print $p->as_text() . "\n";

是否可以在单个搜索中获取此内容,类似于css选择器div p?我是否仅限于XPath?

1 个答案:

答案 0 :(得分:1)

您可look_up形成所有p以查看它们是否包含在div中:

#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };

use HTML::TreeBuilder;

sub paragraph_whose_ancestor_is_div {
    my $node = shift;
    return 'p' eq $node->{_tag} && $node->look_up(_tag => 'div')
}

my $tree = 'HTML::TreeBuilder'->new;
$tree->parse("<html><div><p>get this</p></div><p>not this</p></html>");

my @p = $tree->look_down(\&paragraph_whose_ancestor_is_div);

say $_->as_text() for @p;