perl look_down标记索引

时间:2015-07-02 05:04:17

标签: html perl

我试图仅在第二个li中获取文本。我怎么在look_down中这样做。在此先感谢:)

 <ul class="threads">
   <li>one</li>
   <li>two</li>
   <li>three</li>
</ul>
<ul class="threads">
   <li>one</li>
   <li>two</li>
   <li>three</li>
</ul>
<ul class="threads">
   <li>one</li>
   <li>two</li>
   <li>three</li>
</ul>

use HTML::TreeBuilder;

my $tree = HTML::TreeBuilder->new;
$tree->parse($url);

foreach my $ul ($tree->look_down(_tag => 'ul', class => 'threads')){
    foreach my $li ($ul->look_down(_tag => 'li')){
        print $li->as_text. "\n";
    }
}

2 个答案:

答案 0 :(得分:0)

https://metacpan.org/pod/HTML::TreeBuilder::XPath

很容易
#!/usr/bin/perl --
use strict; use warnings;
use HTML::TreeBuilder::XPath;

my $html = q{<ul class="threads">
   <li>one</li>
   <li>two</li>
   <li>three</li>
</ul>
<ul class="threads">
   <li>one</li>
   <li>two</li>
   <li>three</li>
</ul>
<ul class="threads">
   <li>one</li>
   <li>two</li>
   <li>three</li>
</ul>};


my $tree = HTML::TreeBuilder::XPath->new;
$tree->parse_content( $html );
for my $sec ( $tree->findnodes( q{    //ul[@class='threads']/li[2]    } ) ){
    $sec->dump;
}
__END__
<li> @0.1.0.1
  "two"
<li> @0.1.1.1
  "two"
<li> @0.1.2.1
  "two"

如果您还不知道xpath :) Re: htmltreexpather.pl - xpath helper, creates xpath search strings from html ($VERSION = 20120112 )

,这会对您有所帮助

答案 1 :(得分:0)

这是一种方法。该程序首先查找具有ul类属性的所有threads元素。然后为每个li后代找到所有这些后代,第二个以文本形式打印

use strict;
use warnings;
use 5.010;

use HTML::TreeBuilder;

my $tree = HTML::TreeBuilder->new_from_file(\*DATA);

for my $ul ($tree->look_down(_tag => 'ul', class => 'threads')) {
  my @li = $ul->find('li');
  say $li[1]->as_trimmed_text;
}

__DATA__
<html>
  <head>
    <title>Page title</title>
  <head>
  <body>
    <ul class="threads">
      <li>one</li>
      <li>two</li>
      <li>three</li>
      <p>para</p>
    </ul>
    <ul class="threads">
      <li>one</li>
      <li>two</li>
      <li>three</li>
    </ul>
    <ul class="threads">
      <li>one</li>
      <li>two</li>
      <li>three</li>
    </ul>
  </body>
</html>

输出

two
two
two

替代地

以下是使用HTML::TreeBuilder::XPath

的替代,更简洁的解决方案

请注意,我省略了DATA部分,与上面的内容相同

use strict;
use warnings;
use 5.010;

use HTML::TreeBuilder::XPath;

my $tree = HTML::TreeBuilder->new_from_file(\*DATA);

for my $li2 ( $tree->findnodes('//ul[@class="threads"]/li[2]') ) {
  say $li2->as_trimmed_text;
}

输出与上面的代码相同