ruby转换数组以散列保存重复键

时间:2014-05-14 16:14:23

标签: ruby arrays hash type-conversion

我需要将git ls-remote的结果下拉到一个数组中,然后将该数组转换为这样的哈希:{commit_hash =>参考}。有时,两个提交哈希值是相同的(但可能有不同的引用)。所以我得到了这样的东西:

["19d97e408ee3f993745b053e281ac9dc69519e06","refs/heads/auto",
 "8f6f47c6e8023540b022586e368c68e1e814ce6d","refs/heads/callout_hooks",  
 "3cbdb4b2fcb85bc7f0ed08b62e2bf2445a7659e8","refs/heads/elab",
 "d38a9a26ef887c08b306bdab210b39882f58e587","refs/heads/elab_6.1",
 "19d97e408ee3f993745b053e281ac9dc69519e06","refs/heads/master",
 "906dfe6eebff832baf0f92683d751432fcc98ab7","refs/heads/regression"]

我要转换为:

{"19d97e408ee3f993745b053e281ac9dc69519e06" => "refs/heads/auto"...}

但是master和auto具有相同的哈希值,因此其中一个哈希值会在转换中被删除。

我如何1.)获取转换中丢弃的值的列表,或2.)通过向键添加特殊字符来使键唯一,如*?

3 个答案:

答案 0 :(得分:8)

我希望你能这样:

ary = [
       "19d97e408ee3f993745b053e281ac9dc69519e06","refs/heads/auto",
       "8f6f47c6e8023540b022586e368c68e1e814ce6d","refs/heads/callout_hooks",  
       "3cbdb4b2fcb85bc7f0ed08b62e2bf2445a7659e8","refs/heads/elab",
       "d38a9a26ef887c08b306bdab210b39882f58e587","refs/heads/elab_6.1",
       "19d97e408ee3f993745b053e281ac9dc69519e06","refs/heads/master",
       "906dfe6eebff832baf0f92683d751432fcc98ab7","refs/heads/regression"
     ]

array_hash = ary.each_slice(2).with_object(Hash.new { |h,k| h[k] = []}) do |(k,v),hash|
  hash[k] << v 
end

# the main advantage is here you wouldn't loose any data, all are with you. You can
# use it as per your need. I think it is a better approach to deal with your situation.
array_hash
# => {"19d97e408ee3f993745b053e281ac9dc69519e06"=>
#      ["refs/heads/auto", "refs/heads/master"],
#     "8f6f47c6e8023540b022586e368c68e1e814ce6d"=>["refs/heads/callout_hooks"],
#     "3cbdb4b2fcb85bc7f0ed08b62e2bf2445a7659e8"=>["refs/heads/elab"],
#     "d38a9a26ef887c08b306bdab210b39882f58e587"=>["refs/heads/elab_6.1"],
#     "906dfe6eebff832baf0f92683d751432fcc98ab7"=>["refs/heads/regression"]}

答案 1 :(得分:4)

如果你做hash_value =&gt;的哈希值参考数组,你会保留所有内容:

array = ["19d97e408ee3f993745b053e281ac9dc69519e06","refs/heads/auto",
 "8f6f47c6e8023540b022586e368c68e1e814ce6d","refs/heads/callout_hooks",  
 "3cbdb4b2fcb85bc7f0ed08b62e2bf2445a7659e8","refs/heads/elab",
 "d38a9a26ef887c08b306bdab210b39882f58e587","refs/heads/elab_6.1",
 "19d97e408ee3f993745b053e281ac9dc69519e06","refs/heads/master",
 "906dfe6eebff832baf0f92683d751432fcc98ab7","refs/heads/regression"
]

array.each_slice(2).reduce({}) { |h, (k, v)| (h[k] ||= []) << v; h }

看起来Arup和我的想法是一样的......

答案 2 :(得分:2)

您为自己想做的事情提供了两个选项:

  • 获取转化中丢弃的值列表
  • 通过向密钥添加特殊字符使密钥唯一

我认为第二种方法是一个坏主意,原因如下:a)你必须有一种修改密钥的方法,这种方法可以使它们成为多个副本;和b)在原件和副本之间建立连接会很尴尬。而且,这简直太丑了。

我看到其他人提出了第三种可能性:更改生成的哈希的形式,以便值字符串数组。这可能对你有用,但它不是你要求的,所以我选择建立一个被删除的值的列表;即除了第一个以外的所有。

<强>代码

def create_hash_and_save_extras(arr)
  arr.each_slice(2).with_object([{},[]]) { |(k,v),(h,ex)|
    h.update({k=>v}) { |k, ov, nv| ex << {k=>nv}; ov } }
end

示例

create_hash_and_save_extras(arr)
  #=> [{"19d97e408ee3f993745b053e281ac9dc69519e06"=>"refs/heads/auto",
  #     "8f6f47c6e8023540b022586e368c68e1e814ce6d"=>"refs/heads/callout_hooks",
  #     "3cbdb4b2fcb85bc7f0ed08b62e2bf2445a7659e8"=>"refs/heads/elab",
  #     "d38a9a26ef887c08b306bdab210b39882f58e587"=>"refs/heads/elab_6.1",
  #     "906dfe6eebff832baf0f92683d751432fcc98ab7"=>"refs/heads/regression"},
  #   [{"19d97e408ee3f993745b053e281ac9dc69519e06"=>"refs/heads/master"}]]

<强>解释

Enumerable#each_slice发送给arr会返回一个枚举器:

enum1 = arr.each_slice(2)
  #=> #<Enumerator: [
  #      "19d97e408ee3f993745b053e281ac9dc69519e06", "refs/heads/auto",
  #      "8f6f47c6e8023540b022586e368c68e1e814ce6d", "refs/heads/callout_hooks",
  #      ...
  #      "906dfe6eebff832baf0f92683d751432fcc98ab7", "refs/heads/regression"
  #   ]:each_slice(2)>

Enumerator#with_object创建一个数组,由最初为空的哈希(由块变量h表示)和最初为空的数组(对于&#34; extras&#34;),表示通过块变量ex,然后将其发送到enum1以创建另一个枚举器(您可以将其视为&#34;复合枚举器&#34; - 请注意对{{1}的引用下面)。

each_slice(2)>:with_object({})

我们可以将enum2 = enum1.with_object([{},[]]) #=> #<Enumerator: #<Enumerator: [ # "19d97e408ee3f993745b053e281ac9dc69519e06", "refs/heads/auto", # "8f6f47c6e8023540b022586e368c68e1e814ce6d", "refs/heads/callout_hooks", # ... # "906dfe6eebff832baf0f92683d751432fcc98ab7", "refs/heads/regression" # ]:each_slice(2)>:with_object([{},[])> 转换为数组,以查看它将传递到其块中的内容:

enum2

enum2.to_a #=> [[["19d97e408ee3f993745b053e281ac9dc69519e06", "refs/heads/auto"], # [{}, []]], # [["8f6f47c6e8023540b022586e368c68e1e814ce6d", "refs/heads/callout_hooks"], # [{}, []]], # [["3cbdb4b2fcb85bc7f0ed08b62e2bf2445a7659e8", "refs/heads/elab"], # [{}, []]], # [["d38a9a26ef887c08b306bdab210b39882f58e587", "refs/heads/elab_6.1"], # [{}, []]], # [["19d97e408ee3f993745b053e281ac9dc69519e06", "refs/heads/master"], # [{}, []]], # [["906dfe6eebff832baf0f92683d751432fcc98ab7", "refs/heads/regression"], # [{}, []]], 传入其块的第一个元素是

enum2

因此,块变量分配如下:

[["19d97e408ee3f993745b053e281ac9dc69519e06", "refs/heads/auto"], [{}, []]]]]

我们现在使用Hash#update(又名k => "19d97e408ee3f993745b053e281ac9dc69519e06" v => "refs/heads/auto" h => {} ex = [] )将Hash#merge!合并到{k,v}h最初为空。)因此

h

变为

h.update({k=>v}) { |k, ov, nv| extras << {k=>nv}; ov }

后面是块

h.update({"19d97e408ee3f993745b053e281ac9dc69519e06"=>"refs/heads/auto"})

但该块仅适用于散列合并散列({ |k, ov, nv| ex << {k=>nv}; ov } )和合并散列(h&#39; s参数)共享相同的键update,其中case kov分别是与nv和正在合并的哈希的键相关联的值。键h的合并值将是块返回的值。是的,当我们遇到重复时,这将适用。

所以现在

k

我们以这种方式继续h #=> {"19d97e408ee3f993745b053e281ac9dc69519e06"=>"refs/heads/auto"} 的每个其他元素。当我们遇到

enum2

我们发现k = "19d97e408ee3f993745b053e281ac9dc69519e06" v = "refs/heads/master" h = {"19d97e408ee3f993745b053e281ac9dc69519e06"=>"refs/heads/auto", "8f6f47c6e8023540b022586e368c68e1e814ce6d"=>"refs/heads/callout_hooks", "3cbdb4b2fcb85bc7f0ed08b62e2bf2445a7659e8"=>"refs/heads/elab", "d38a9a26ef887c08b306bdab210b39882f58e587"=>"refs/heads/elab_6.1"} 已经在合并散列k中,因此会对该块进行评估,以确定合并散列hk的值。我们希望保留当前值h,即h[k],这就是块返回的内容。但是,首先,我们将(仍为空的)数组ov附加到重复值,表示为散列。

ex