Nokogiri多个域名

时间:2013-02-04 14:33:11

标签: ruby web-scraping nokogiri

是否可以使用Nokogiri进行多域搜索。我知道你可以为单个域/页面进行多个Xpath / CSS搜索但是多域?

例如,我想抓取http://www.asus.com/Notebooks_Ultrabooks/S56CA/#specificationshttp://www.asus.com/Notebooks_Ultrabooks/ASUS_TAICHI_21/#specifications

我的代码

require 'nokogiri'
require 'open-uri'
require 'spreadsheet'

doc = Nokogiri::HTML(open("http://www.asus.com/Notebooks_Ultrabooks/ASUS_TAICHI_21/#specifications"))

#Grab our product specifications
data = doc.css('div#specifications div#spec-area ul.product-spec li')

#Modify our data
lines = data.map(&:text)

#Create the Spreadsheet
Spreadsheet.client_encoding = 'UTF-8'
book = Spreadsheet::Workbook.new

sheet1 = book.create_worksheet
sheet1.name = 'My First Worksheet'

#Output our data  to the Spreadsheet
lines.each.with_index do |line, i|                                                        
  sheet1[i, 0] = line                                                                     
end    

book.write 'C:/Users/Barry/Desktop/output.xls'

1 个答案:

答案 0 :(得分:2)

Nokogiri没有URL的概念,它只知道XML或HTML的String或IO流。你把OpenURI的目的与Nokogiri's混淆了。

如果您想从多个网站上阅读,只需循环访问网址,然后将当前网址传递给OpenURI到open页面:

%w[
  http://www.asus.com/Notebooks_Ultrabooks/S56CA/#specifications 
  http://www.asus.com/Notebooks_Ultrabooks/ASUS_TAICHI_21/#specifications
].each do |url|

  doc = Nokogiri::HTML(open(url))
  # do somethng with the document...
end

OpenURI将读取该页面,并将其内容传递给Nokogiri进行解析。 Nokogiri仍然只会一次看到一个页面,因为这一切都是由OpenURI传递的。