Question

我试图使用Perl Scrappy进行搜索。我想使用'select'选择带有class属性的html元素。

<p>
  <h1>
   <a href='http://test.com'>Test</a>
   <a href='http://list.com'>List</a>
  </h1>
</p>
<p class='parent-1'>
  <h1>
   <a class='child-1' href="http://sample.com">SampleLink</a>
   <a class='child-2' href="http://list.com">List</a>
  </h1>
</p>

我需要使用select方法获取类名为“child-1”的元素（'a'标记），这是<p class='parent-1'>的子节点。

我试过这样的

#!/usr/bin/perl

use Scrappy;

my  $scraper = Scrappy->new;
$scraper->get($url);
$scraper->select('p a')->data;

但它也会选择第一个'p'标签。

你能帮帮我吗？

Answer 1

问题是，在HTML中，<p>代码不能包含<h1>代码。实际上，HTML被解析为

<p></p>
  <h1>
   <a href='http://test.com'>Test</a>
   <a href='http://list.com'>List</a>
  </h1>    
<p class='parent-1'></p>
  <h1>
   <a class='child-1' href="http://sample.com">SampleLink</a>
   <a class='child-2' href="http://list.com">List</a>
  </h1>

Answer 2

请注意choroba's warning，选择一个<a>元素，其中child-1类是<p>元素的子元素，其类为parent-1你会写

$scraper->select('p.parent-1 > a.child-1')

使用类属性

2 个答案: