如何使用Nokogiri刮HTML?

时间:2013-08-31 16:30:37

标签: ruby screen-scraping nokogiri

我无法取消产品的价格,我得到的产量如下所示:

<div class="pu-final">
  <span class="fk-font-17 fk-bold">Rs. 1999</span>
</div>

我的代码是:

require 'rubygems'
require 'nokogiri'
require 'open-uri'

url = "http://www.flipkart.com/mens-footwear/shoes/casual-shoes/pr?sid=osp,cil,nit,e1f"
doc = Nokogiri::HTML(open(url))
puts doc.at_css("title").text
doc.css(".gu4,.browse-product").each do |item|
  title = item.at_css(".fk-display-block,.title").text
  puts title
  puts "================="
  price = item.at_css(".pu-final")
  puts price
end

2 个答案:

答案 0 :(得分:2)

我尝试了相同的代码并进行了一些小改动,但效果很好。试一试。

变化

price = item.at_css(".pu-final")

price = item.at_css(".pu-final").text unless item.at_css(".pu-final").nil?

答案 1 :(得分:0)

您可以执行以下操作:

require 'nokogiri'

doc = Nokogiri::HTML::Document.parse <<-eotl
<div class="pu-final">

                    <span class="fk-font-17 fk-bold">Rs. 1999</span>
</div>
eotl

doc.at_css('div.pu-final > span.fk-font-17.fk-bold').class
# => Nokogiri::XML::Element
doc.at_css('div.pu-final > span.fk-font-17.fk-bold').text 
# => "Rs. 1999"

doc.at_css('div.pu-final')会给你Nokogiri::XML::Node。然后,您必须使用Nokogiri::XML::Node#text()来获取元素内的文本值。

使用XPATH

doc.xpath("normalize-space(//div[contains(@class,'pu-final')]/span[contains(@class,'fk-font-17')])")
# => "Rs. 1999"

完整代码

require 'nokogiri'
require 'open-uri'

url = "http://www.flipkart.com/mens-footwear/shoes/casual-shoes/pr?sid=osp,cil,nit,e1f"
doc = Nokogiri::HTML(open(url))

doc.css("div.pu-details.lastUnit").each do |dv|
  product_name = dv.at_css('div.pu-title a').text.strip
  product_price = dv.xpath("normalize-space(.//div[contains(@class,'pu-final')]/span)").to_s
  print product_name,"  <----->  ",product_price,"\n"
end

<强>输出

Fila Storm Zender Sneakers  <----->  Rs. 1819
Puma Future Cat M1 Big 102 O Sneakers  <----->  Rs. 3849
Fila Filamotor V4 Sneakers  <----->  Rs. 1449
Adidas Volantis Hiking Shoes  <----->  Rs. 2999
Fila Varsity Sneakers  <----->  Rs. 1249
Puma Evo Speed F1 Low BMW Sneakers  <----->  Rs. 2609
Lee Cooper Running and Walking Shoes  <----->  Rs. 1329
Lee Cooper Running and Walking Shoes  <----->  Rs. 1329
United Colors of Benetton Sneakers  <----->  Rs. 2799
United Colors of Benetton Party Wear Shoes  <----->  Rs. 2449
Timberland 6 In Premium Boots  <----->  Rs. 8490
Timberland Ek Mid Boots  <----->  Rs. 8490
Clarks Montacute Lord Boots  <----->  Rs. 3249
Clarks Latch Mast Corporate Casuals  <----->  Rs. 1999
Levi's Boots  <----->  Rs. 2999