我试图仅在第二个li中获取文本。我怎么在look_down中这样做。在此先感谢:)
<ul class="threads">
<li>one</li>
<li>two</li>
<li>three</li>
</ul>
<ul class="threads">
<li>one</li>
<li>two</li>
<li>three</li>
</ul>
<ul class="threads">
<li>one</li>
<li>two</li>
<li>three</li>
</ul>
use HTML::TreeBuilder;
my $tree = HTML::TreeBuilder->new;
$tree->parse($url);
foreach my $ul ($tree->look_down(_tag => 'ul', class => 'threads')){
foreach my $li ($ul->look_down(_tag => 'li')){
print $li->as_text. "\n";
}
}
答案 0 :(得分:0)
https://metacpan.org/pod/HTML::TreeBuilder::XPath
很容易#!/usr/bin/perl --
use strict; use warnings;
use HTML::TreeBuilder::XPath;
my $html = q{<ul class="threads">
<li>one</li>
<li>two</li>
<li>three</li>
</ul>
<ul class="threads">
<li>one</li>
<li>two</li>
<li>three</li>
</ul>
<ul class="threads">
<li>one</li>
<li>two</li>
<li>three</li>
</ul>};
my $tree = HTML::TreeBuilder::XPath->new;
$tree->parse_content( $html );
for my $sec ( $tree->findnodes( q{ //ul[@class='threads']/li[2] } ) ){
$sec->dump;
}
__END__
<li> @0.1.0.1
"two"
<li> @0.1.1.1
"two"
<li> @0.1.2.1
"two"
如果您还不知道xpath :) Re: htmltreexpather.pl - xpath helper, creates xpath search strings from html ($VERSION = 20120112 )
,这会对您有所帮助答案 1 :(得分:0)
这是一种方法。该程序首先查找具有ul
类属性的所有threads
元素。然后为每个li
后代找到所有这些后代,第二个以文本形式打印
use strict;
use warnings;
use 5.010;
use HTML::TreeBuilder;
my $tree = HTML::TreeBuilder->new_from_file(\*DATA);
for my $ul ($tree->look_down(_tag => 'ul', class => 'threads')) {
my @li = $ul->find('li');
say $li[1]->as_trimmed_text;
}
__DATA__
<html>
<head>
<title>Page title</title>
<head>
<body>
<ul class="threads">
<li>one</li>
<li>two</li>
<li>three</li>
<p>para</p>
</ul>
<ul class="threads">
<li>one</li>
<li>two</li>
<li>three</li>
</ul>
<ul class="threads">
<li>one</li>
<li>two</li>
<li>three</li>
</ul>
</body>
</html>
two
two
two
请注意,我省略了DATA
部分,与上面的内容相同
use strict;
use warnings;
use 5.010;
use HTML::TreeBuilder::XPath;
my $tree = HTML::TreeBuilder->new_from_file(\*DATA);
for my $li2 ( $tree->findnodes('//ul[@class="threads"]/li[2]') ) {
say $li2->as_trimmed_text;
}
输出与上面的代码相同