我正在使用Creek解析Excel文件。这是第一行(标题):
"Uhr"
,所有其他行是:
{"A"=>"Date", "B"=>"Portfolio", "C"=>"Currency"}
我的目标是拥有相同的数组,其中在[
{"A"=>2019-05-16 00:00:00 +0200, "B"=>"TEXT", "C"=>"INR"},
{"A"=>2019-05-20 00:00:00 +0200, "B"=>"TEXT2", "C"=>"EUR"}
]
哈希值中使用正则表达式将所有哈希键替换为mapping
的键。
例如,在标题中,键与以下REGEX匹配:
mapping
所以我需要像这样替换所有数据行:
mapping = {
date: /Date|Data|datum|Fecha/,
portfolio_name: /Portfolio|portafoglio|Portfolioname|cartera|portefeuille/,
currency: /Currency|Valuta|Währung|Divisa|Devise/
}
答案 0 :(得分:4)
在单独的步骤中检测列名称。中间映射看起来像{"A"=>:date, "B"=>:portfolio_name, "C"=>:currency}
,然后可以转换数据数组。
这很简单:
header_mapping = header.transform_values{|v|
mapping.find{|key,regex| v.match?(regex) }&.first || raise("Unknown header field #{v}")
}
rows.map{|row|
row.transform_keys{|k| header_mapping[k].to_s }
}
代码需要针对本机Hash#transform_*
或ActiveSupport的Ruby 2.4 +
答案 1 :(得分:1)
require 'time'
mappings = {
date: /Date|Data|datum|Fecha/,
portfolio_name: /Portfolio|portafoglio|Portfolioname|cartera|portefeuille/,
currency: /Currency|Valuta|Währung|Divisa|Devise/
}
rows = [
{"A"=>"Date", "B"=>"Portfolio", "C"=>"Currency"},
{"A"=>Time.parse('2019-05-16 00:00:00 +0200'), "B"=>"TEXT", "C"=>"INR"},
{"A"=>Time.parse('2019-05-20 00:00:00 +0200'), "B"=>"TEXT2", "C"=>"EUR"}
]
header_row = rows.first
mapped_header_row = header_row.inject({}) do |hash, (k, v)|
mapped_name = mappings.find do |mapped_name, regex|
v.match? regex
end&.first
# defaults to `v.to_sym` (Header Name), if not in mappings
# you can also raise an Exception here instead if not in mappings, depending on your expectations
hash[k] = mapped_name || v.to_sym
hash
end
mapped_rows = rows[1..-1].map do |row|
new_row = {}
row.each do |k, v|
new_row[mapped_header_row[k]] = v
end
new_row
end
puts mapped_rows
# => [
# {:date=>2019-05-16 00:00:00 +0200, :portfolio_name=>"TEXT", :currency=>"INR"},
# {:date=>2019-05-20 00:00:00 +0200, :portfolio_name=>"TEXT2", :currency=>"EUR"}
# ]
require 'time'
mappings = {
date: /Date|Data|datum|Fecha/,
portfolio_name: /Portfolio|portafoglio|Portfolioname|cartera|portefeuille/,
currency: /Currency|Valuta|Währung|Divisa|Devise/
}
rows = [
{"A"=>"Date", "B"=>"Portfolio", "C"=>"Currency"},
{"A"=>Time.parse('2019-05-16 00:00:00 +0200'), "B"=>"TEXT", "C"=>"INR"},
{"A"=>Time.parse('2019-05-20 00:00:00 +0200'), "B"=>"TEXT2", "C"=>"EUR"}
]
我们首先提取第一行,以获取列名。
header_row = rows.first
puts header_row
# => {"A"=>"Date", "B"=>"Portfolio", "C"=>"Currency"}
我们需要遍历每个哈希对:(键,值),并且需要确定“值”是否对应于我们的任何mappings
变量。
简而言之,我们需要以某种方式进行转换(即):
header_row = {"A"=>"Date", "B"=>"Portfolio", "C"=>"Currency"}
进入
mapped_header_row = {"A"=>"date", "B"=>"portfolio_name", "C"=>"currency"}
等等...
mapped_header_row = header_row.inject({}) do |hash, (k, v)|
mapped_name = mappings.find do |mapped_name, regex|
v.match? regex
end&.first
# defaults to `v.to_sym` (Header Name), if not in mappings
# you can also raise an Exception here instead if not in mappings, depending on your expectations
hash[k] = mapped_name || v.to_sym
hash
end
puts mapped_header_row
# => {"A"=>"date", "B"=>"portfolio_name", "C"=>"currency"}
请参见inject
请参见find
现在我们有了mapped_header_row
(或每列的“映射”标签/名称),那么我们只需简单地更新第二行的所有“键”,直到最后一行,名称为“映射”:键分别为“ A”,“ B”和“ C” ...分别用“ date”,“ portfolio_name”和“ currency”替换
# row[1..-1] means the 2nd element in the array until the last element
mapped_rows = rows[1..-1].map do |row|
new_row = {}
row.each do |k, v|
new_row[mapped_header_row[k]] = v
end
new_row
end
puts mapped_rows
# => [
# {:date=>2019-05-16 00:00:00 +0200, :portfolio_name=>"TEXT", :currency=>"INR"},
# {:date=>2019-05-20 00:00:00 +0200, :portfolio_name=>"TEXT2", :currency=>"EUR"}
# ]
请参见map