是否可以使用Nokogiri进行多域搜索。我知道你可以为单个域/页面进行多个Xpath / CSS搜索但是多域?
例如,我想抓取http://www.asus.com/Notebooks_Ultrabooks/S56CA/#specifications和http://www.asus.com/Notebooks_Ultrabooks/ASUS_TAICHI_21/#specifications
我的代码
require 'nokogiri'
require 'open-uri'
require 'spreadsheet'
doc = Nokogiri::HTML(open("http://www.asus.com/Notebooks_Ultrabooks/ASUS_TAICHI_21/#specifications"))
#Grab our product specifications
data = doc.css('div#specifications div#spec-area ul.product-spec li')
#Modify our data
lines = data.map(&:text)
#Create the Spreadsheet
Spreadsheet.client_encoding = 'UTF-8'
book = Spreadsheet::Workbook.new
sheet1 = book.create_worksheet
sheet1.name = 'My First Worksheet'
#Output our data to the Spreadsheet
lines.each.with_index do |line, i|
sheet1[i, 0] = line
end
book.write 'C:/Users/Barry/Desktop/output.xls'
答案 0 :(得分:2)
Nokogiri没有URL的概念,它只知道XML或HTML的String或IO流。你把OpenURI的目的与Nokogiri's混淆了。
如果您想从多个网站上阅读,只需循环访问网址,然后将当前网址传递给OpenURI到open
页面:
%w[
http://www.asus.com/Notebooks_Ultrabooks/S56CA/#specifications
http://www.asus.com/Notebooks_Ultrabooks/ASUS_TAICHI_21/#specifications
].each do |url|
doc = Nokogiri::HTML(open(url))
# do somethng with the document...
end
OpenURI将读取该页面,并将其内容传递给Nokogiri进行解析。 Nokogiri仍然只会一次看到一个页面,因为这一切都是由OpenURI传递的。