难以使用Nokogiri拉<li>元素</li>

时间:2014-07-29 02:53:32

标签: ruby-on-rails ruby web-scraping nokogiri

我正在尝试开发一个刮刀来从NewEgg中提取内容。我在Ruby on Rails上安装了Nokogiri,据我所知,它正在工作。但是,我很难提取包含定价信息的特定元素,而且我不完全确定它为什么不起作用。下面的代码应该查找列表类&#34;价格当前&#34;并放置该代码的每个实例。相反,我没有得到任何结果。

require 'rubygems'
require 'open-uri'
require 'nokogiri'

page = Nokogiri::HTML(open("http://www.newegg.com/Product/Product.aspx?Item=N82E16820313436"))   

page.xpath('//li[@class="price-current "]').each do |item|
  puts item
end

在过去的两个小时里,我一直在撕扯我的头发试图解决这个问题但没有成功。任何见解都会非常感激!

编辑:因此,@ MarkReed对我正在寻找由JS生成的信息是正确的。仔细查看代码,哈希中似乎有很多细节。是否有可能在Nokogiri中使用RegEx来获取该信息?

  var utag_data = {
  page_breadcrumb:'Home &gt; Computer Hardware &gt; Memory &gt; Desktop Memory &gt; Team Group &gt; Item#:N82E16820313436',
        page_tab_name:'Computer Hardware',
        product_category_id:['17'],
        product_category_name:['Memory'],
        product_subcategory_id:['147'],
        product_subcategory_name:['Desktop Memory'],
        product_id:['20-313-436'],
        product_web_id:['N82E16820313436'],
        product_title:['Team Zeus Yellow 8GB (2 x 4GB) 240-Pin DDR3 SDRAM DDR3 1600 (PC3 12800) Desktop Memory Model TZYD38G1600HC9DC01'],
        product_manufacture:['Team Group'],
        product_unit_price:['79.99'],
        product_sale_price:['66.99'],
        product_default_shipping_cost:['0.01'],
        product_type:['Newegg'],
        product_model:['TZYD38G1600HC9DC01'],
        product_instock:['1'],
        product_group_id:['0'],
        page_type:'Product',
        site_region:'USA',
        site_currency:'USD',
        page_name:'ProductDetail',
        search_scope:jQuery('#haQuickSearchStore option:selected').text(),
        user_nvtc:Web.StateManager.Cookies.get(Web.StateManager.Cookies.Name.NVTC),
        user_name:Web.StateManager.Cookies.get(Web.StateManager.Cookies.Name.LOGIN,'LOGINID6'),
        third_party_render:['3cb31f7b6faf223eb237af8c737abcebce803020','4774d6780334a7bf9c3c95255c60401916d07cae','e3770e5b640207523c7ac0afed2237ce2f79cd27','9c3638f897ed4a655fd0bd839f04e1c412d54bff','78b8b16d9d0f6f2e8419ac12fa710f5153f1cee3','65531e14b4d9b9a223cc3bfcb65ce7b5f356011d','2a5e772a0f941c862180037f8a5c118c7abf2f7d','9011adc5233493f5adc5f0f0f1bcb655892c09e3']

  };

1 个答案:

答案 0 :(得分:1)

您似乎正在搜索在页面加载后由浏览器中的Javascript动态添加的DOM元素。它们不存在于最初从URL中提取的HTML中,因此Nokogiri无法访问它们。