Ruby使用正则表达式替换哈希键

时间:2019-06-28 12:25:30

标签: ruby-on-rails ruby hashmap ruby-on-rails-5

我正在使用Creek解析Excel文件。这是第一行(标题):

"Uhr"

,所有其他行是:

{"A"=>"Date", "B"=>"Portfolio", "C"=>"Currency"}

我的目标是拥有相同的数组,其中在[ {"A"=>2019-05-16 00:00:00 +0200, "B"=>"TEXT", "C"=>"INR"}, {"A"=>2019-05-20 00:00:00 +0200, "B"=>"TEXT2", "C"=>"EUR"} ] 哈希值中使用正则表达式将所有哈希键替换为mapping的键。

例如,在标题中,键与以下REGEX匹配:

mapping

所以我需要像这样替换所有数据行:

mapping = {
    date: /Date|Data|datum|Fecha/,
    portfolio_name: /Portfolio|portafoglio|Portfolioname|cartera|portefeuille/,
    currency: /Currency|Valuta|Währung|Divisa|Devise/
    }

2 个答案:

答案 0 :(得分:4)

在单独的步骤中检测列名称。中间映射看起来像{"A"=>:date, "B"=>:portfolio_name, "C"=>:currency},然后可以转换数据数组。

这很简单:

header_mapping = header.transform_values{|v|
  mapping.find{|key,regex| v.match?(regex) }&.first || raise("Unknown header field #{v}")
}

rows.map{|row|
  row.transform_keys{|k| header_mapping[k].to_s }
}

代码需要针对本机Hash#transform_*或ActiveSupport的Ruby 2.4 +

答案 1 :(得分:1)

TL:DR;

require 'time'

mappings = {
  date: /Date|Data|datum|Fecha/,
  portfolio_name: /Portfolio|portafoglio|Portfolioname|cartera|portefeuille/,
  currency: /Currency|Valuta|Währung|Divisa|Devise/
}

rows = [
  {"A"=>"Date", "B"=>"Portfolio", "C"=>"Currency"},
  {"A"=>Time.parse('2019-05-16 00:00:00 +0200'), "B"=>"TEXT", "C"=>"INR"}, 
  {"A"=>Time.parse('2019-05-20 00:00:00 +0200'), "B"=>"TEXT2", "C"=>"EUR"}
]

header_row = rows.first

mapped_header_row = header_row.inject({}) do |hash, (k, v)|
  mapped_name = mappings.find do |mapped_name, regex|
    v.match? regex
  end&.first

  # defaults to `v.to_sym` (Header Name), if not in mappings
  # you can also raise an Exception here instead if not in mappings, depending on your expectations
  hash[k] = mapped_name || v.to_sym 
  hash
end

mapped_rows = rows[1..-1].map do |row|
  new_row = {}
  row.each do |k, v|
    new_row[mapped_header_row[k]] = v
  end
  new_row
end

puts mapped_rows
# => [
#      {:date=>2019-05-16 00:00:00 +0200, :portfolio_name=>"TEXT", :currency=>"INR"},
#      {:date=>2019-05-20 00:00:00 +0200, :portfolio_name=>"TEXT2", :currency=>"EUR"}
#    ]

给出:

require 'time'

mappings = {
  date: /Date|Data|datum|Fecha/,
  portfolio_name: /Portfolio|portafoglio|Portfolioname|cartera|portefeuille/,
  currency: /Currency|Valuta|Währung|Divisa|Devise/
}

rows = [
  {"A"=>"Date", "B"=>"Portfolio", "C"=>"Currency"},
  {"A"=>Time.parse('2019-05-16 00:00:00 +0200'), "B"=>"TEXT", "C"=>"INR"}, 
  {"A"=>Time.parse('2019-05-20 00:00:00 +0200'), "B"=>"TEXT2", "C"=>"EUR"}
]

步骤:

  1. 我们首先提取第一行,以获取列名。

    header_row = rows.first
    puts header_row
    # => {"A"=>"Date", "B"=>"Portfolio", "C"=>"Currency"}
    
  2. 我们需要遍历每个哈希对:(键,值),并且需要确定“值”是否对应于我们的任何mappings变量。

    简而言之,我们需要以某种方式进行转换(即):

    header_row = {"A"=>"Date", "B"=>"Portfolio", "C"=>"Currency"}

    进入

    mapped_header_row = {"A"=>"date", "B"=>"portfolio_name", "C"=>"currency"}

    等等...

    mapped_header_row = header_row.inject({}) do |hash, (k, v)|
      mapped_name = mappings.find do |mapped_name, regex|
        v.match? regex
      end&.first
    
      # defaults to `v.to_sym` (Header Name), if not in mappings
      # you can also raise an Exception here instead if not in mappings, depending on your expectations
      hash[k] = mapped_name || v.to_sym 
      hash
    end
    
    puts mapped_header_row
    # => {"A"=>"date", "B"=>"portfolio_name", "C"=>"currency"}
    

    请参见inject

    请参见find

  3. 现在我们有了mapped_header_row(或每列的“映射”标签/名称),那么我们只需简单地更新第二行的所有“键”,直到最后一行,名称为“映射”:键分别为“ A”,“ B”和“ C” ...分别用“ date”,“ portfolio_name”和“ currency”替换

    # row[1..-1] means the 2nd element in the array until the last element
    mapped_rows = rows[1..-1].map do |row|
      new_row = {}
      row.each do |k, v|
        new_row[mapped_header_row[k]] = v
      end
      new_row
    end
    
    puts mapped_rows
    # => [
    #      {:date=>2019-05-16 00:00:00 +0200, :portfolio_name=>"TEXT", :currency=>"INR"},
    #      {:date=>2019-05-20 00:00:00 +0200, :portfolio_name=>"TEXT2", :currency=>"EUR"}
    #    ]
    

    请参见map