用xpath解析Nokogiri返回空字符串

时间:2014-10-01 12:07:23

标签: ruby-on-rails-4 xpath nokogiri httparty

我有以下HTML:

<div>
 <table>
  <tr>
   <td>

    <div class="w135">

     <div style="float: left; padding-right: 10px;" class="imageThumbnail playerDiv">
      <a href="/sport/tennis/2014/10/djokovic-through-wozniacki-out-china-open-2014101114115427766.html" id="ctl00_ctl00_DataList1_ctl00_Thumbnail1_lnkImage10" target="_parent">
       <img src="/mritems/imagecache/89/135/mritems/images/2014/10/1/2014101114447491734_20.jpg" id="ctl00_ctl00_DataList1_ctl00_Thumbnail1_imgSmall10" border="0" class="imageThumbnail">
            </a>
     </div>


    </div>
   </td>
  </tr>
 </table>
</div>

当我尝试耙子时,我收到错误:

NoMethodError: undefined method `at_css' for ["id","ctl00_cphBody_ctl01_DataList1_ctl00_Thumbnail1_Layout17"]:Array

这是代码:

@request = HTTParty.get(url)

@html = Nokogiri::HTML(@request.body)

@html.css(".w135")[0].map do |item|

    url = item.at_css("div.playerDiv a")

    puts url.inspect
end   

我真的不确定问题是什么,并且一直试图解决这个问题。此行url = item.at_css("div.playerDiv a")

出错

任何建议都表示赞赏!

由于

1 个答案:

答案 0 :(得分:0)

我是这样做的:

require 'nokogiri'

doc = Nokogiri::HTML(<<EOT)
<div>
 <table>
  <tr>
   <td>

    <div class="w135">

     <div style="float: left; padding-right: 10px;" class="imageThumbnail playerDiv">
      <a href="/sport/tennis/2014/10/djokovic-through-wozniacki-out-china-open-2014101114115427766.html" id="ctl00_ctl00_DataList1_ctl00_Thumbnail1_lnkImage10" target="_parent">
       <img src="/mritems/imagecache/89/135/mritems/images/2014/10/1/2014101114447491734_20.jpg" id="ctl00_ctl00_DataList1_ctl00_Thumbnail1_imgSmall10" border="0" class="imageThumbnail">
            </a>
     </div>


    </div>
   </td>
  </tr>
 </table>
</div>
EOT

puts doc.search('.w135 div.playerDiv a').map(&:inspect)

哪个输出:

# >> #<Nokogiri::XML::Element:0x3ff0918b132c name="a" attributes=[#<Nokogiri::XML::Attr:0x3ff0918b1250 name="href" value="/sport/tennis/2014/10/djokovic-through-wozniacki-out-china-open-2014101114115427766.html">, #<Nokogiri::XML::Attr:0x3ff0918b123c name="id" value="ctl00_ctl00_DataList1_ctl00_Thumbnail1_lnkImage10">, #<Nokogiri::XML::Attr:0x3ff0918b1228 name="target" value="_parent">] children=[#<Nokogiri::XML::Text:0x3ff0918a5b6c "\n       ">, #<Nokogiri::XML::Element:0x3ff0918a5360 name="img" attributes=[#<Nokogiri::XML::Attr:0x3ff0918a4d20 name="src" value="/mritems/imagecache/89/135/mritems/images/2014/10/1/2014101114447491734_20.jpg">, #<Nokogiri::XML::Attr:0x3ff0918a4cbc name="id" value="ctl00_ctl00_DataList1_ctl00_Thumbnail1_imgSmall10">, #<Nokogiri::XML::Attr:0x3ff0918a4b90 name="border" value="0">, #<Nokogiri::XML::Attr:0x3ff0918a4a28 name="class" value="imageThumbnail">]>, #<Nokogiri::XML::Text:0x3ff091871920 "\n            ">]>

如果你正试图访问&#34; href&#34;参数,而不是使用inspect,使用:

puts doc.search('.w135 div.playerDiv a').map{ |n| n['href'] }
# >> /sport/tennis/2014/10/djokovic-through-wozniacki-out-china-open-2014101114115427766.html