使用Ruby将不同深度嵌套的哈希分解为单独的哈希

时间:2014-03-26 12:57:28

标签: ruby mongodb recursion hash

我对Ruby很陌生,但我已经完成了大量搜索,在Stack上进行了研究和实验。

我正在获取包含变量信息的POST数据,我可以将其转换为XML中的哈希值。

我的目标是:

  1. 获取并存储父母密钥层次结构。
  2. 我正在通过这些POST创建我所获得的MongoDb记录,我需要记录哪些密钥存储了我得到的任何新密钥,这些密钥已经不属于集合密钥。

    一旦我存储了密钥层次结构,我需要使用嵌套哈希并将每个顶级密钥及其子项分解为另一个哈希。这些将最终作为MongoDb记录中的单个子文档。

    一个很大的障碍是我不知道前面的层次结构或任何关键名称,所以我必须创建一个并不真正关心哈希中的内容的解析器,它只是组织密钥结构,并将哈希分解为代表每个顶级'的单独哈希值。密钥包含在哈希中。

    我有一个嵌套哈希:

    {"hashdata"=>
      {"ComputersCount"=>
        {"Total"=>1, "Licensed"=>1, "ByOS"=>{"OS"=>{"Windows 7 x64"=>1}}},
       "ScansCount"=>
        {"Total"=>8,
         "Scheduled"=>8,
         "Agent"=>0,
         "ByScanningProfile"=>{"Profile"=>{"Missing Patches"=>8}}},
       "RemediationsCount"=>{"Total"=>1, "ByType"=>{"Type"=>{"9"=>1}}},
       "AgentsCount"=>{"Total"=>0},
       "RelaysCount"=>{"Total"=>0},
       "ScanResultsDatabase"=>{"Type"=>"MSAccess"}}}
    

    在此示例中,忽略' hashdata'关键,顶级'父母是:

    ComputersCount ScansCount RemediationsCount RelaysCount ScanResultsDatabase

    理想情况下,我最终会得到每个父键及其子键的哈希值,以及每个顶级父键的单独哈希值。

    编辑:我不确定表达关键词哈希的最好方法是'但我知道它需要包含层次结构的意义,关于结构中的键可能具有什么级别和父级。

    对于单独的哈希本身,它可以很简单:

    {"ComputersCount"=>{"Total"=>1, "Licensed"=>1, "ByOS"=>{"OS"=>{"Windows 7 x64"=>1}}}}
    
    {"ScansCount"=>{"Total"=>8,"Scheduled"=>8,"Agent"=>0,"ByScanningProfile"=>{"Profile"=>{"Missing Patches"=>8}}}}
    
    {"RemediationsCount"=>{"Total"=>1, "ByType"=>{"Type"=>{"9"=>1}}}}
    
    {"AgentsCount"=>{"Total"=>0}}
    
    {"RelaysCount"=>{"Total"=>0}}
    
    {"ScanResultsDatabase"=>{"Type"=>"MSAccess"}}}
    

    我的最终目标是获取密钥集合和哈希集合并将它们存储在MongoDb中,每个子哈希都是一个子文档,密钥集合为我提供了集合的列名称映射,因此可以查询后面。

    我使用一些递归方法接近解决方案,例如:

    def recurse_hash(h,p=nil)
    
        h.each_pair do |k,v|
    
            case v
                when String, Fixnum then
                    p "Key: #{k}, Value: #{v}"
                when Hash then 
                    h.find_all_values_for(v)
                    recurse_hash(v,k)
                else raise ArgumentError "Unhandled type #{v.class}"
            end
        end
    end
    

    但到目前为止,我只能接近我之后的所作所为。最终,我需要准备好使用任何级别的嵌套或值结构来获取哈希值,因为POST数据变化很大。

    非常感谢任何建议,指导或其他帮助 - 我意识到我很可能错误地接近整个挑战。

2 个答案:

答案 0 :(得分:0)

看起来你想要一个像下面这样的哈希数组:

array = hash["hashdata"].map { |k,v| { k => v } }
# => [{"ComputersCount"=>{"Total"=>1, "Licensed"=>1, "ByOS"=>{"OS"=>{"Windows 7 x64"=>1}}}}, ... ] 

array.first
# => {"ComputersCount"=>{"Total"=>1, "Licensed"=>1, "ByOS"=>{"OS"=>{"Windows 7 x64"=>1}}}} 
array.last
# => {"ScanResultsDatabase"=>{"Type"=>"MSAccess"}} 

答案 1 :(得分:0)

这是我对“关键结构层次结构和父母身份”的最佳猜测。 我轻轻地暗示它有点矫枉过正。 相反,我认为您真正需要做的只是将您的哈希数据直接存储为MongoDB文档。 即使您的POST数据变化很大, 在所有可能的情况下,它仍然可以很好地构建,以便您可以毫无困难地编写应用程序。

这是一个包含“关键结构层次和亲子关系”的测试, 但也许更重要的是,只显示将您的哈希数据直接存储为MongoDB文档是多么微不足道。 测试运行两次以演示新的密钥发现。

test.rb

require 'mongo'
require 'test/unit'
require 'pp'

def key_structure(h)
  h.keys.sort.collect{|k| v = h[k]; v.is_a?(Hash) ? [k, key_structure(h[k])] : k}
end

class MyTest < Test::Unit::TestCase
  def setup
    @hash_data_coll = Mongo::MongoClient.new['test']['hash_data']
    @hash_data_coll.remove
    @keys_coll = Mongo::MongoClient.new['test']['keys']
  end

  test "extract cancer drugs" do
    hash_data = {
        "hashdata" =>
            {"ComputersCount" =>
                 {"Total" => 1, "Licensed" => 1, "ByOS" => {"OS" => {"Windows 7 x64" => 1}}},
             "ScansCount" =>
                 {"Total" => 8,
                  "Scheduled" => 8,
                  "Agent" => 0,
                  "ByScanningProfile" => {"Profile" => {"Missing Patches" => 8}}},
             "RemediationsCount" => {"Total" => 1, "ByType" => {"Type" => {"9" => 1}}},
             "AgentsCount" => {"Total" => 0},
             "RelaysCount" => {"Total" => 0},
             "ScanResultsDatabase" => {"Type" => "MSAccess"}}}
    known_keys = @keys_coll.find.to_a.collect{|doc| doc['key']}.sort
    puts "known keys: #{known_keys}"
    hash_data_keys = hash_data['hashdata'].keys.sort
    puts "hash data keys: #{hash_data_keys.inspect}"
    new_keys = hash_data_keys - known_keys
    puts "new keys: #{new_keys.inspect}"
    @keys_coll.insert(new_keys.collect{|key| {key: key, structure: key_structure(hash_data['hashdata'][key]), timestamp: Time.now}}) unless new_keys.empty?
    pp @keys_coll.find.to_a unless new_keys.empty?
    @hash_data_coll.insert(hash_data['hashdata'])
    assert_equal(1, @hash_data_coll.count)
    pp @hash_data_coll.find.to_a
  end
end

$ ruby​​ test.rb

Loaded suite test
Started
known keys: []
hash data keys: ["AgentsCount", "ComputersCount", "RelaysCount", "RemediationsCount", "ScanResultsDatabase", "ScansCount"]
new keys: ["AgentsCount", "ComputersCount", "RelaysCount", "RemediationsCount", "ScanResultsDatabase", "ScansCount"]
[{"_id"=>BSON::ObjectId('535976177f11ba278d000001'),
  "key"=>"AgentsCount",
  "structure"=>["Total"],
  "timestamp"=>2014-04-24 20:37:43 UTC},
 {"_id"=>BSON::ObjectId('535976177f11ba278d000002'),
  "key"=>"ComputersCount",
  "structure"=>[["ByOS", [["OS", ["Windows 7 x64"]]]], "Licensed", "Total"],
  "timestamp"=>2014-04-24 20:37:43 UTC},
 {"_id"=>BSON::ObjectId('535976177f11ba278d000003'),
  "key"=>"RelaysCount",
  "structure"=>["Total"],
  "timestamp"=>2014-04-24 20:37:43 UTC},
 {"_id"=>BSON::ObjectId('535976177f11ba278d000004'),
  "key"=>"RemediationsCount",
  "structure"=>[["ByType", [["Type", ["9"]]]], "Total"],
  "timestamp"=>2014-04-24 20:37:43 UTC},
 {"_id"=>BSON::ObjectId('535976177f11ba278d000005'),
  "key"=>"ScanResultsDatabase",
  "structure"=>["Type"],
  "timestamp"=>2014-04-24 20:37:43 UTC},
 {"_id"=>BSON::ObjectId('535976177f11ba278d000006'),
  "key"=>"ScansCount",
  "structure"=>
   ["Agent",
    ["ByScanningProfile", [["Profile", ["Missing Patches"]]]],
    "Scheduled",
    "Total"],
  "timestamp"=>2014-04-24 20:37:43 UTC}]
[{"_id"=>BSON::ObjectId('535976177f11ba278d000007'),
  "ComputersCount"=>
   {"Total"=>1, "Licensed"=>1, "ByOS"=>{"OS"=>{"Windows 7 x64"=>1}}},
  "ScansCount"=>
   {"Total"=>8,
    "Scheduled"=>8,
    "Agent"=>0,
    "ByScanningProfile"=>{"Profile"=>{"Missing Patches"=>8}}},
  "RemediationsCount"=>{"Total"=>1, "ByType"=>{"Type"=>{"9"=>1}}},
  "AgentsCount"=>{"Total"=>0},
  "RelaysCount"=>{"Total"=>0},
  "ScanResultsDatabase"=>{"Type"=>"MSAccess"}}]
.

Finished in 0.028869 seconds.

1 tests, 1 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
100% passed

34.64 tests/s, 34.64 assertions/s

$ ruby​​ test.rb

Loaded suite test
Started
known keys: ["AgentsCount", "ComputersCount", "RelaysCount", "RemediationsCount", "ScanResultsDatabase", "ScansCount"]
hash data keys: ["AgentsCount", "ComputersCount", "RelaysCount", "RemediationsCount", "ScanResultsDatabase", "ScansCount"]
new keys: []
[{"_id"=>BSON::ObjectId('535976197f11ba278e000001'),
  "ComputersCount"=>
   {"Total"=>1, "Licensed"=>1, "ByOS"=>{"OS"=>{"Windows 7 x64"=>1}}},
  "ScansCount"=>
   {"Total"=>8,
    "Scheduled"=>8,
    "Agent"=>0,
    "ByScanningProfile"=>{"Profile"=>{"Missing Patches"=>8}}},
  "RemediationsCount"=>{"Total"=>1, "ByType"=>{"Type"=>{"9"=>1}}},
  "AgentsCount"=>{"Total"=>0},
  "RelaysCount"=>{"Total"=>0},
  "ScanResultsDatabase"=>{"Type"=>"MSAccess"}}]
.

Finished in 0.015559 seconds.

1 tests, 1 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
100% passed

64.27 tests/s, 64.27 assertions/s