为什么HTML :: Element中的look_down方法无法找到<section>元素?

时间:2019-07-16 23:19:50

标签: perl html-treebuilder

下面的代码显示TreeBuilder方法look_down找不到“ section”元素。为什么?

use strict;
use warnings;
use HTML::TreeBuilder;

my $html =<<'END_HTML';
<html>
<head><title></title></head>
<body>
<div attrname="div">
<section attrname="section">
</section>
</div>
</body>
</html>
END_HTML

my $tree = HTML::TreeBuilder->new_from_content($html);

my @divs = $tree->look_down('attrname', 'div');
print "number of div elements found = ", scalar(@divs), "\n";

my @sections = $tree->look_down('attrname', 'section');
print "number of section elements found = ", scalar(@sections), "\n";

$tree->delete();

输出:     找到的div元素数量= 1     找到的部分元素数量= 0

2 个答案:

答案 0 :(得分:2)

my @divs = $tree->look_down('attrname', 'div');
print "number of div elements found = ", scalar(@divs), "\n";

之所以找到一个元素,是因为它与 属性 attrname与恰好位于div标签上的值<div>相匹配。

my @sections = $tree->look_down('attrname', 'section');
print "number of section elements found = ", scalar(@sections), "\n";

这与任何内容都不匹配,因为没有标签包含名为attrname且值为section的属性。

应该是

my @divs = $tree->look_down(_tag => 'div');
...
my @sections = $tree->look_down(_tag => 'section');

HTML::Element#lookdown文档中,所有这些内容都有些晦涩难懂。对于什么是“标准”,没有明确的解释,您必须阅读整个页面才能找到伪属性_tag来引用标记名称...但是仔细阅读整个页面后,从长远来看,也许可以节省您数小时的沮丧:-)

答案 1 :(得分:2)

这对我有用:

my $tree = HTML::TreeBuilder->new;
$tree->ignore_unknown(0);  # <-- Include unknown elements in tree
$tree->parse($html);
my @divs = $tree->look_down('attrname', 'div');
my @sections = $tree->look_down('attrname', 'section');
print "number of div elements found = ", scalar(@divs), "\n";
print "number of section elements found = ", scalar(@sections), "\n";

输出

number of div elements found = 1
number of section elements found = 1