我正在尝试开发一个刮刀来从NewEgg中提取内容。我在Ruby on Rails上安装了Nokogiri,据我所知,它正在工作。但是,我很难提取包含定价信息的特定元素,而且我不完全确定它为什么不起作用。下面的代码应该查找列表类"价格当前"并放置该代码的每个实例。相反,我没有得到任何结果。
require 'rubygems'
require 'open-uri'
require 'nokogiri'
page = Nokogiri::HTML(open("http://www.newegg.com/Product/Product.aspx?Item=N82E16820313436"))
page.xpath('//li[@class="price-current "]').each do |item|
puts item
end
在过去的两个小时里,我一直在撕扯我的头发试图解决这个问题但没有成功。任何见解都会非常感激!
编辑:因此,@ MarkReed对我正在寻找由JS生成的信息是正确的。仔细查看代码,哈希中似乎有很多细节。是否有可能在Nokogiri中使用RegEx来获取该信息?
var utag_data = {
page_breadcrumb:'Home > Computer Hardware > Memory > Desktop Memory > Team Group > Item#:N82E16820313436',
page_tab_name:'Computer Hardware',
product_category_id:['17'],
product_category_name:['Memory'],
product_subcategory_id:['147'],
product_subcategory_name:['Desktop Memory'],
product_id:['20-313-436'],
product_web_id:['N82E16820313436'],
product_title:['Team Zeus Yellow 8GB (2 x 4GB) 240-Pin DDR3 SDRAM DDR3 1600 (PC3 12800) Desktop Memory Model TZYD38G1600HC9DC01'],
product_manufacture:['Team Group'],
product_unit_price:['79.99'],
product_sale_price:['66.99'],
product_default_shipping_cost:['0.01'],
product_type:['Newegg'],
product_model:['TZYD38G1600HC9DC01'],
product_instock:['1'],
product_group_id:['0'],
page_type:'Product',
site_region:'USA',
site_currency:'USD',
page_name:'ProductDetail',
search_scope:jQuery('#haQuickSearchStore option:selected').text(),
user_nvtc:Web.StateManager.Cookies.get(Web.StateManager.Cookies.Name.NVTC),
user_name:Web.StateManager.Cookies.get(Web.StateManager.Cookies.Name.LOGIN,'LOGINID6'),
third_party_render:['3cb31f7b6faf223eb237af8c737abcebce803020','4774d6780334a7bf9c3c95255c60401916d07cae','e3770e5b640207523c7ac0afed2237ce2f79cd27','9c3638f897ed4a655fd0bd839f04e1c412d54bff','78b8b16d9d0f6f2e8419ac12fa710f5153f1cee3','65531e14b4d9b9a223cc3bfcb65ce7b5f356011d','2a5e772a0f941c862180037f8a5c118c7abf2f7d','9011adc5233493f5adc5f0f0f1bcb655892c09e3']
};
答案 0 :(得分:1)
您似乎正在搜索在页面加载后由浏览器中的Javascript动态添加的DOM元素。它们不存在于最初从URL中提取的HTML中,因此Nokogiri无法访问它们。