Ruby:读取xls文件的内容并获取每个单元格信息

时间:2013-01-28 18:57:41

标签: ruby spreadsheet xls

This是XLS文件的链接。我试图使用Spreadsheet gem来提取XLS文件的内容。特别是,我想收集所有列标题,如(年,国民生产总值等)。但是,问题是它们不在同一行。例如,国民总收入由三排组成。我还想知道合并了多少行单元格以使单元格为'Year'。

我已经开始编写程序,我对此表示赞同:

require 'rubygems'
require 'open-uri'
require 'spreadsheet'

rows = Array.new
url = 'http://www.stats.gov.cn/tjsj/ndsj/2012/html/C0201e.xls'
doc = Spreadsheet.open (open(url))
sheet1 = doc.worksheet 0
sheet1.each do |row|
      if row.is_a? Spreadsheet::Formula
          # puts row.value
          rows << row.value
     else
          # puts row
          rows << row
     end
  # puts row.value
end

但是,现在我陷入困境,真的需要一些准则来继续。任何形式的帮助都很受欢迎。

1 个答案:

答案 0 :(得分:3)

require 'rubygems'
require 'open-uri'
require 'spreadsheet'

rows = Array.new
temp_rows = Array.new
column_headers = Array.new
index = 0
url = 'http://www.stats.gov.cn/tjsj/ndsj/2012/html/C0201e.xls'
doc = Spreadsheet.open (open(url))
sheet1 = doc.worksheet 0
sheet1.each do |row|
   rows << row.to_a
end

rows.each_with_index do |row,ind|
  if row[0]=="Year"
    index = ind
    break
  end
end

(index..7).each do |i|
  # puts rows[i].inspect
  if rows[i][0] =~ /[0-9]/
    break 
  else
    temp_rows << rows[i]
  end 
end

col_size = temp_rows[0].size
# puts temp_rows.inspect

col_size.times do |c|
  temp_str = ""
  temp_rows.each do |row|
    temp_str +=' '+ row[c] unless row[c].nil?
  end
  # puts temp_str.inspect
  column_headers << temp_str unless temp_str.nil?
end
puts 'Column Headers of this xls file are : '
# puts column_headers.inspect
column_headers.each do |col|
  puts col.strip.inspect if col.length >1
end