Nokogiri迭代tr标签太多次了

时间:2017-10-28 12:34:30

标签: ruby nokogiri

我正在抓取此页面https://www.library.uq.edu.au/uqlsm/availablepcsembed.php?branch=Duhig以及每个<use-appserver-config>true</use-appserver-config>我正在收集并返回级别名称和可用计算机数量。

问题在于它被迭代了太多次。只有4 tr个标签,但循环经历了5次迭代。这会导致额外的tr附加到返回数组。这是为什么?

Scraped Section:

nil

缩短方法:

<table class="chart">
    <tr valign="middle">
        <td class="left"><a href="availablepcsembed.php?branch=Duhig&room=Lvl1">Level 1</a></td>
        <td class="middle"><div style="width:68%;"><strong>68%</strong></div></td>
        <td class="right">23 Free of 34 PC's</td>
    </tr>

    <tr valign="middle">
        <td class="left"><a href="availablepcsembed.php?branch=Duhig&room=Lvl2">Level 2</a></td>
        <td class="middle"><div style="width:78%;"><strong>78%</strong></div></td>
        <td class="right">83 Free of 107 PC's</td>
    </tr>

    <tr valign="middle">
        <td class="left"><a href="availablepcsembed.php?branch=Duhig&room=Lvl4">Level 4</a></td>
        <td class="middle"><div style="width:64%;"><strong>64%</strong></div></td>
        <td class="right">9 Free of 14 PC's</td>
    </tr>

    <tr valign="middle">
        <td class="left"><a href="availablepcsembed.php?branch=Duhig&room=Lvl5">Level 5</a></td>
        <td class="middle"><div style="width:97%;"><strong>97%</strong></div></td>
        <td class="right">28 Free of 29 PC's</td>
    </tr>
</table>

1 个答案:

答案 0 :(得分:1)

您可以指定表的class属性,然后访问里面的tr标记,这样就可以避免使用“附加”tr,例如:

details_page.css("table.chart tr").map do |level|
  ...

简化scrape_details_page方法:

def scrape_details_page(library_url)
  details_page = Nokogiri::HTML(open(library_url))
  details_page.css('table.chart tr').map do |level|
    right = level.css('.right').text.split
    { name: level.css('a[href]').text, total_available: right[0], out_of_available: right[3] }
  end
end

p scrape_details_page('https://www.library.uq.edu.au/uqlsm/availablepcsembed.php?branch=Duhig')

# [{:name=>"Level 1", :total_available=>"22", :out_of_available=>"34"},
#  {:name=>"Level 2", :total_available=>"98", :out_of_available=>"107"},
#  {:name=>"Level 4", :total_available=>"12", :out_of_available=>"14"},
#  {:name=>"Level 5", :total_available=>"26", :out_of_available=>"29"}]