如何使用nokogiri基于标题获取表格中的链接

时间:2014-11-11 14:36:18

标签: html ruby nokogiri

如何根据表格标题获取表格中的所有链接?

<table class="wikitable sortable plainrowheaders">
   <caption>Film</caption>
   <tr>
      <th scope="col">Year</th>
      <th scope="col">Title</th>
      <th scope="col">Role</th>
      <th scope="col" class="unsortable">Notes</th>
   </tr>
   <tr>
      <td style="text-align:center;">1997</td>
      <th scope="row"><i><span class="sortkey">Ice Storm, The</span><span class="vcard"><span  class="fn"><a href="/wiki/The_Ice_Storm_(film)" title="The Ice Storm (film)">The Ice Storm</a></span>  </span></i></th>
      <td>Libbets Casey</td>
      <td>First professional role</td>
   </tr>
</table>

我试过这个

doc = Nokogiri::HTML(str)
doc.xpath('//table[caption=''Film'']//a/@href').each do |href|
  p href
end

但这不会打印任何内容。

1 个答案:

答案 0 :(得分:1)

您可以按如下方式编写代码: -

require 'nokogiri'

doc = Nokogiri::HTML::Document.parse <<-EOT
<table class="wikitable sortable plainrowheaders">
   <caption>Film</caption>
   <tr>
      <th scope="col">Year</th>
      <th scope="col">Title</th>
      <th scope="col">Role</th>
      <th scope="col" class="unsortable">Notes</th>
   </tr>
   <tr>
      <td style="text-align:center;">1997</td>
      <th scope="row"><i><span class="sortkey">Ice Storm, The</span><span class="vcard"><span  class="fn"><a href="/wiki/The_Ice_Storm_(film)" title="The Ice Storm (film)">The Ice Storm</a></span>  </span></i></th>
      <td>Libbets Casey</td>
      <td>First professional role</td>
   </tr>
</table>
EOT

doc.xpath("//table[./caption[text()='Film']]//a").each do |node|
  p node['href']
end

# >> "/wiki/The_Ice_Storm_(film)"