如何根据表格标题获取表格中的所有链接?
<table class="wikitable sortable plainrowheaders">
<caption>Film</caption>
<tr>
<th scope="col">Year</th>
<th scope="col">Title</th>
<th scope="col">Role</th>
<th scope="col" class="unsortable">Notes</th>
</tr>
<tr>
<td style="text-align:center;">1997</td>
<th scope="row"><i><span class="sortkey">Ice Storm, The</span><span class="vcard"><span class="fn"><a href="/wiki/The_Ice_Storm_(film)" title="The Ice Storm (film)">The Ice Storm</a></span> </span></i></th>
<td>Libbets Casey</td>
<td>First professional role</td>
</tr>
</table>
我试过这个
doc = Nokogiri::HTML(str)
doc.xpath('//table[caption=''Film'']//a/@href').each do |href|
p href
end
但这不会打印任何内容。
答案 0 :(得分:1)
您可以按如下方式编写代码: -
require 'nokogiri'
doc = Nokogiri::HTML::Document.parse <<-EOT
<table class="wikitable sortable plainrowheaders">
<caption>Film</caption>
<tr>
<th scope="col">Year</th>
<th scope="col">Title</th>
<th scope="col">Role</th>
<th scope="col" class="unsortable">Notes</th>
</tr>
<tr>
<td style="text-align:center;">1997</td>
<th scope="row"><i><span class="sortkey">Ice Storm, The</span><span class="vcard"><span class="fn"><a href="/wiki/The_Ice_Storm_(film)" title="The Ice Storm (film)">The Ice Storm</a></span> </span></i></th>
<td>Libbets Casey</td>
<td>First professional role</td>
</tr>
</table>
EOT
doc.xpath("//table[./caption[text()='Film']]//a").each do |node|
p node['href']
end
# >> "/wiki/The_Ice_Storm_(film)"