如何附加到具有多个键的大型YAML文件

时间:2016-06-16 21:56:03

标签: ruby append yaml

我有一个非常大的YAML文件,我需要附加到文件本身的某些键。 (我将给出大约一半的值和键,但是所有的YAML):

md5:
  0db1af356a757d7e6141de9e0509b6c0: aspherical,
  cc582c8a08d9983c4cabbf4db79346d6: aledo,
  ffb013ac241e53910c0babbe5fc27928: aet,
  4fa28fa73a7577c68dddb9cbf337680e: aglisten,
  c7ead5d7e7d7fbbee16c49e398fc335d: assessable,
  61ea1a1a2645db7442479a0c23dc9e27: amaranthaceous,
  f447b20a7fcbf53a5d5be013ea0b15af: 123456,
  286755fad04869ca523320acce0dc6a4: password,
  d577273ff885c3f84dadb8578bb41399: 12345,
  23cdc18507b52418db7740cbb5543e54: 12345678,
  a86850deb2742ec3cb41518e26aa2d89: qwerty,
  b2cfa4183267af678ea06c7407d4d6d8: 123456789,
  e7df7cd2ca07f4f1ab415d457a6e1c13: 1234,
  5b9d07ad9c1bed09d6986e593f4ca7dc: baseball,
  c0ce0dff9996a7d40c1e96a944dd0fc5: dragon,
  a174fdeed30655297c43208a716875b3: football,
  1b504d3328e16fdf281d1fb9516dd90b: 1234567,
  2f548f61bd37f628077e552ae1537be2: monkey,
  4aacf9c858c82716ab0034320bd2efe9: letmein,
  2c6c8ab6ba8b9c98a1939450eb4089ed: abc123,
  77a319564621b96fa0656e24c67960ef: 111111,
  d3c3ce1b8b4e88e23a04a1f123eeb593: mustang,
  a56ffd9f01fa749377cbaea011a57365: access,
  54bad4757ad046d8e4e762aea1e022a7: shadow,
  c963080767f45828c31f83ca5cd25d36: master,
  3a8c088f9cfe9a0a564fe3fbb277263a: michael
sha256:
  d47ac74b773ffff504ede166b4d62a575ae2beccd7966dbee33f26ff84114d8f: aspherical,
  978b54c60b0e86b1b51019bb5c88f92fb381c7145f600dbb1280d96398f5feea: aledo,
  c3236dd164056aa319b58d07a1e1c0bc5815dc39783981b19fabfa92714c18ba: aet,
  d5375b39c5a773f55a33d71507d6080014f5fd302032ec62de03a2a56c33707b: aglisten,
  88cca75f4214587a63f468855b1e942495bb5390467e543a4a28f636753d9262: assessable,
  2defadb0da568c52e952ae495368840d42a4ebe849ab47150f6807e086ab28e2: amaranthaceous,
  e150a1ec81e8e93e1eae2c3a77e66ec6dbd6a3b460f89c1d08aecf422ee401a0: 123456,
  6b3a55e0261b0304143f805a24924d0c1c44524821305f31d9277843b8a10f4e: password,
  f33ae3bc9a22cd7564990a794789954409977013966fb1a8f43c35776b833a95: 12345,
  2634c3097f98e36865f0c572009c4ffd73316bc8b88ccfe8d196af35f46e2394: 12345678,
  9ceece10cf8b97d1f1924dae5d14c137fd144ce999ede85f48be6d7582e2dd23: qwerty,
  6d78392a5886177fe5b86e585a0b695a2bcd01a05504b3c4e38bc8eeb21e8326: 123456789,
  a883dafc480d466ee04e0d6da986bd78eb1fdd2178d04693723da3a8f95d42f4: 1234,
  e11184da809af8dca98d471082647632fd954d674913e41d2e7aed93d2d224c7: baseball,
  24e11938b9091b4dbc66a5e5b4705834e5e738fa85a80cd5f8844d976026b49e: dragon,
  205b60ee79914af6a09b897170b522c5e16366214b9a0735b4eb550f4b14a3c8: football,
  349abe1272178917136372f667b13753e2c775bbe39112118420b7697749c97b: 1234567,
  5a6e48105fea75ccccc66a038318f398c42761495d738786dc8a6d43179aa16a: monkey,
  8de47d5aa7d61e92c577d8156b966583f6d7d75d714a3b99fca4fb2f8bfe97c6: letmein,
  5ecf8d2cc410094e8b82dd0bc178a57f3aa1e80916689beb00fe56148b1b1256: abc123,
  9d272f1f3e92f7c5efdcfdda0ab92facccd98c340be8be09064060503fd167e4: 111111,
  aeb3e1c05ceed52c929eb539b0d45ffb12ecc68881ca28a634d9e02ff49225e9: mustang,
  74a53d5ef93d260701dee7ef8ae4957d363a299d9e8a195cbd87ab63ffb4d0e4: access,
  36c8168624b0b6e3a623e064b82730af1c30c1dae97ab260237a800c39707941: shadow,
  9b3162498c21d7f960877099174ecea13410bd21d12440b2ea8868117fc08ae0: master,
  bb472c3cc2b662a74956c8539fec9fe73f2b8a9f9124506aa0474698b3bac62d: michael,
  f4b0726157bf8b1aab7b74cfe5195fd2c2d5b11ad8902e545de460fe1217e3fa: superman
sha1:
  21bb0c84ecc88629788314747337af2d4c3d6a4f: aspherical,
  aee0161f2d7168beac985db5cfda43ac959e053b: aledo,
  73a2bd0d315aaef2b90ee4de15ab4a34e048703a: aet,
  591b6b8a007ac1c67be430bb6431adbaab930538: aglisten,
  c1ad62f7db7beff07353b0d42e685dec6ef12f4e: assessable,
  c689373a119b17a512782682df48d85e47a4e9de: amaranthaceous,
  c4f9375f9834b4e7f0a528cc65c055702bf5f24a: 123456,
  c8fed00eb2e87f1cee8e90ebbe870c190ac3848c: password,
  2672275fe0c456fb671e4f417fb2f9892c7573ba: 12345,
  9806af3952e1380212b0998f07a6afe4e5f00428: 12345678,
  3c8b9f4b983afa9f644d26e2b34fa3e03a2bef16: qwerty,
  179c94cf45c6e383baf52621687305204cef16f9: 123456789,
  1be168ff837f043bde17c0314341c84271047b31: 1234,
  e1f48cd1226e4ce7ab8bf87d15ce7c9b0014cf16: baseball,
  8851def7166796964bf58174a5f3f50d073d709d: dragon,
  3516c253ea583fe2c60e983c7b8bc9075aedd161: football,
  e017693e4a04a59d0b0f400fe98177fe7ee13cf7: 1234567,
  744a9a056f145b86339221bb457aa57129f55bc2: monkey,
  34ca062314edaa193e03f318ae20ae134274b358: letmein,
  61ee8b5601a84d5154387578466c8998848ba089: abc123,
  3ee88a74d3722b336a69c428d226f731435c71ba: 111111,
  059a9d50d1155bb31ad65df3e0cfb20c8f98894b: mustang,
  65b9b171f2173eccc48c8764f91a8bcc1b586c4f: access
sha2:
  d47ac74b773ffff504ede166b4d62a575ae2beccd7966dbee33f26ff84114d8f: aspherical,
  978b54c60b0e86b1b51019bb5c88f92fb381c7145f600dbb1280d96398f5feea: aledo,
  c3236dd164056aa319b58d07a1e1c0bc5815dc39783981b19fabfa92714c18ba: aet,
  d5375b39c5a773f55a33d71507d6080014f5fd302032ec62de03a2a56c33707b: aglisten,
  88cca75f4214587a63f468855b1e942495bb5390467e543a4a28f636753d9262: assessable,
  2defadb0da568c52e952ae495368840d42a4ebe849ab47150f6807e086ab28e2: amaranthaceous,
  e150a1ec81e8e93e1eae2c3a77e66ec6dbd6a3b460f89c1d08aecf422ee401a0: 123456,
  6b3a55e0261b0304143f805a24924d0c1c44524821305f31d9277843b8a10f4e: password,
  f33ae3bc9a22cd7564990a794789954409977013966fb1a8f43c35776b833a95: 12345,
  2634c3097f98e36865f0c572009c4ffd73316bc8b88ccfe8d196af35f46e2394: 12345678,
  9ceece10cf8b97d1f1924dae5d14c137fd144ce999ede85f48be6d7582e2dd23: qwerty,
  6d78392a5886177fe5b86e585a0b695a2bcd01a05504b3c4e38bc8eeb21e8326: 123456789,
  a883dafc480d466ee04e0d6da986bd78eb1fdd2178d04693723da3a8f95d42f4: 1234,
  e11184da809af8dca98d471082647632fd954d674913e41d2e7aed93d2d224c7: baseball,
  24e11938b9091b4dbc66a5e5b4705834e5e738fa85a80cd5f8844d976026b49e: dragon,
  205b60ee79914af6a09b897170b522c5e16366214b9a0735b4eb550f4b14a3c8: football,
  349abe1272178917136372f667b13753e2c775bbe39112118420b7697749c97b: 1234567,
  5a6e48105fea75ccccc66a038318f398c42761495d738786dc8a6d43179aa16a: monkey,
  8de47d5aa7d61e92c577d8156b966583f6d7d75d714a3b99fca4fb2f8bfe97c6: letmein,
  5ecf8d2cc410094e8b82dd0bc178a57f3aa1e80916689beb00fe56148b1b1256: abc123,
  9d272f1f3e92f7c5efdcfdda0ab92facccd98c340be8be09064060503fd167e4: 111111

我想在YAML文件中附加sha256哈希。如何在不附加到文件的任何其他部分的情况下附加到文件的该部分?

我试过了:

def add_to(hash, type, word)
  type[hash] = word
  File.open('./lib/list/rainbow_table.yml', 'a+'){ |s| YAML.dump(type, s) }
end #<= Outputs error: "rainbow.rb:67:in `[]='"

如何在不删除文件或收到错误的情况下解决此问题,并附加到这一方面?

1 个答案:

答案 0 :(得分:1)

最简单的方法是使用YAML将文件加载到内存中。你会得到哈希哈希。修改有问题的哈希,然后重写文件。

例如:

require 'yaml'

data = YAML.load(<<EOT)
md5:
  0db1af356a757d7e6141de9e0509b6c0: aspherical,
sha256:
  d47ac74b773ffff504ede166b4d62a575ae2beccd7966dbee33f26ff84114d8f: aspherical,
sha1:
  21bb0c84ecc88629788314747337af2d4c3d6a4f: aspherical,
sha2:
  d47ac74b773ffff504ede166b4d62a575ae2beccd7966dbee33f26ff84114d8f: aspherical,
EOT

md5 = data['md5']
md5['another_key'] = 'foo'
md5['some_other_key'] = 'bar'

data['md5'] = md5
puts data.to_yaml

# >> ---
# >> md5:
# >>   0db1af356a757d7e6141de9e0509b6c0: "\uFEFFaspherical,"
# >>   another_key: foo
# >>   some_other_key: bar
# >> sha256:
# >>   d47ac74b773ffff504ede166b4d62a575ae2beccd7966dbee33f26ff84114d8f: "\uFEFFaspherical,"
# >> sha1:
# >>   21bb0c84ecc88629788314747337af2d4c3d6a4f: "\uFEFFaspherical,"
# >> sha2:
# >>   d47ac74b773ffff504ede166b4d62a575ae2beccd7966dbee33f26ff84114d8f: "\uFEFFaspherical,"

您希望确保您实施安全的文件更新,以避免破坏旧文件,但这是一个不同的问题。

可以遍历文件,读取行找到相应的部分,插入行,然后继续遍历文件的其余部分。必须将原始文件中的每一行写入输出文件。它的工作量更大,但编写代码并不困难,但是,除非YAML文件不适合记忆,否则我会使用上述概念。

File.open('./lib/list/rainbow_table.yml', 'a+'){ |s| YAML.dump(type, s) }

您不想附加到这样的YAML文件。 YAML是结构化数据,可以是散列数据,也可以是数组,因此您必须保持适当的结构,而这种结构不会增加。您可以在一个文件中包含多个YAML文档,但它会导致文件混乱,并且在您完成后仍然无法提供您想要的内容。阅读然后重写文件将有助于使其更有条理。

使用类似:

File.write('path/to/file.yaml', some_array_or_hash.to_yaml)

使用open追加不会使其变得更复杂。您无法干净地附加到YAML文件并维护对象,因此请勿使用'a'。覆盖文件并生成一个干净的对象。

它可以帮助您阅读the YAML specification,尤其是关于文件中多个文档的部分,然后尝试不同的想法。我想你会发现附加会打破你的对象。

默想:

require 'yaml'

obj1 = {
  'a' => {
    'foo' => ['something']
  }
}

obj1 # => {"a"=>{"foo"=>["something"]}}

obj1.to_yaml # => "---\na:\n  foo:\n  - something\n"

YAML.load(obj1.to_yaml) 
# => {"a"=>{"foo"=>["something"]}}

它演示了如何创建一个对象,对其进行序列化以及解析序列化输出以使对象恢复。

这是解决问题的更温和方法。

这似乎是你想要做的事情:

obj2 = {
  'a' => {
    'bar' => ['something else']
  }
}

output_file = obj1.to_yaml
# => "---\na:\n  foo:\n  - something\n"

此时您有一个YAML文件。

您想附加到它,导致包含两个YAML文档的文件:

output_file += obj2.to_yaml
# => "---\na:\n  foo:\n  - something\n---\na:\n  bar:\n  - something else\n"

当您重新加载文件时,您将覆盖初始'a'键/值对,因为哈希只能有唯一键,并且第一个'a'将被任何尝试覆盖在文件的后续部分阅读:

YAML.load(output_file)
# => {"a"=>{"foo"=>["something"]}}

YAML支持将所有文档加载到文件中,但是它们会被解析为对象数组:

YAML.load_stream(output_file)
# => [{"a"=>{"foo"=>["something"]}}, {"a"=>{"bar"=>["something else"]}}]

每次阅读或更改文件并重建原始对象时,您都必须遍历该数组,这将花费更长时间,并且容易出错。

如果您使用load_file阅读原始文件,而不是这样做,那么您将在内存中使用它。然后你可以附加到你想要的部分:

obj1['a']['bar'] = ['something else']
obj1 
# => {"a"=>{"foo"=>["something"], "bar"=>["something else"]}}

您可以将修改后的数据写回文件:

output_file = obj1.to_yaml 
# => "---\na:\n  foo:\n  - something\n  bar:\n  - something else\n"

并在以后正确地重新加载它。再次,使用load_file,而非load,因为我在这里使用,从文件中读取YAML数据。

YAML.load(output_file) 
# => {"a"=>{"foo"=>["something"], "bar"=>["something else"]}}

在Ruby中查找YAML的文档包括阅读YAML类文档以及the Psych documentation。几年前,Psych被引入以加速YAML处理。

而且,就像侧边栏一样,Psych / YAML可以读取/解析JSON数据,因为JSON是YAML的子集:

YAML.load('{"foo": "bar"}')
# => {"foo"=>"bar"}