如何使用Ruby将数组的内容添加到电子表格中

时间:2012-07-30 08:44:32

标签: ruby csv axlsx

我正在http://h10010.www1.hp.com/wwpc/ie/en/ho/WF06b/321957-321957-3329742-89318-89318-5186820-5231694.html?dnr=1

抓取一个网页

首先,我创建了所需的关键字数组(线索),然后执行Xpath查询,将结果输入CSV。一切顺利,但电子表格需要更好的格式,以便最终用户可以复制和粘贴

有没有办法可以使用CSV或Axslx实现我想要的外观

我的代码如下:

require 'rubygems'
require 'nokogiri'   
require 'open-uri'
require 'CSV'
require 'axlsx'

#Set encoding options to remove nasty Trademark symbols
  encoding_options = {
    :invalid           => :replace,  # Replace invalid byte sequences
    :undef             => :replace,  # Replace anything not defined in ASCII
    :replace           => '',        # Use a blank for those replacements
    :universal_newline => true       # Always break lines with \n
  }

doc = Nokogiri::HTML(open("http://h10010.www1.hp.com/wwpc/ie/en/ho/WF06b/321957-321957-3329742-89318-89318-5186820-5231694.html?dnr=1"))
#For each break create a ;
doc.css('br').each{ |br| br.replace ';' }

clues = Array.new
clues << 'Operating system'
clues << 'Processors'
clues << 'Chipset'
clues << 'Memory type'
clues << 'Hard drive'
clues << 'Graphics'
clues << 'Ports'
clues << 'Webcam'
clues << 'Pointing device'
clues << 'Keyboard'
clues << 'Network interface'
clues << 'Chipset'
clues << 'Wireless'
clues << 'Power supply type'
clues << 'Energy efficiency'
clues << 'Weight'
clues << 'Minimum dimensions (W x D x H)'
clues << 'Warranty'
clues << 'Software included'
clues << 'Product color'

CSV.open("output.csv", "wb") do |csv|
  #1. Output the Clues header
  #2. Scrape the output/force encoding to remove special characters
    csv << clues
    csv << clues.map{|clue| doc.at("//td[text()='#{clue}']/following-sibling::td").text.strip.encode Encoding.find('ASCII'), encoding_options}
  #end loop
end

我的代码可以将整个数组添加到一行,但是如何将数组中的foreach项添加到换行符?我试过了\ n但它没有用。

我得到的输出

The output I get

我想要的输出

My desired output

3 个答案:

答案 0 :(得分:1)

这是axlsx的作者兰迪姆。我想你想这样做:

clues = Array.new
clues << 'Operating system'
clues << 'Processors'
clues << 'Chipset'
clues << 'Memory type'

Axlsx::Package.new do |p|
  p.workbook do |wb|
    wb.add_worksheet do |sheet|
      clues.each { |clue| sheet.add_row [clue] }
    end
  end
  p.serialize 'My_Spreadsheet.xlsx'
end

至于你的第二个问题:

selector = "//td[text()='%s']/following-sibling::td"
data = clues.map do |clue| 
         xpath = selector % clue
         [clue, doc.at(xpath).text.strip]
       end

然后使用

data.each { |datum| sheet.add_row datum }

构建工作表时

require 'rubygems'
require 'nokogiri'   
require 'open-uri'
require 'axlsx'

doc = Nokogiri::HTML(open("http://h10010.www1.hp.com/wwpc/ie/en/ho/WF06b/321957-321957-3329742-89318-89318-5186820-5231694.html?dnr=1"))
#For each break create a ;
doc.css('br').each{ |br| br.replace ';' }

clues = Array.new
clues << 'Operating system'
clues << 'Processors'
clues << 'Chipset'
clues << 'Memory type'
clues << 'Hard drive'
clues << 'Graphics'
clues << 'Ports'
clues << 'Webcam'
clues << 'Pointing device'
clues << 'Keyboard'
clues << 'Network interface'
clues << 'Chipset'
clues << 'Wireless'
clues << 'Power supply type'
clues << 'Energy efficiency'
clues << 'Weight'
clues << 'Minimum dimensions (W x D x H)'
clues << 'Warranty'
clues << 'Software included'
clues << 'Product color'

selector = "//td[text()='%s']/following-sibling::td"
data = clues.map do |clue| 
         xpath = selector % clue
         [clue, doc.at(xpath).text.strip]
       end

Axlsx::Package.new do |p|
  p.workbook.add_worksheet do |sheet|
    data.each { |datum| sheet.add_row datum }
  end
  p.serialize 'output.xlsx'
end

为您带来快乐的截屏。 enter image description here

答案 1 :(得分:0)

使用each迭代数组:

clues.each do |clue|
  sheet.add_row [clue]
end

答案 2 :(得分:0)

尝试类似:

CSV.open("output.csv", "wb") do |csv|
  clues.each do |clue|
    value = doc.at("//td[text()='#{clue}']/following-sibling::td").text.strip.encode Encoding.find('ASCII'), encoding_options
    cvs << [clue, value]
  end
end