如何保留包含具有重复键的对象的JSON文档中的所有键值对?

时间:2018-01-18 18:04:06

标签: python json python-2.7 duplicates deserialization

我在正确完成这项工作时遇到了一些麻烦,但我的数据看起来像这样:

{  
      "completedProtocol": "Extract",
      "map": [
        {
          "sampleIDsIn": [{ "clarityId": "claritySample1", "espId": "ESP024254" }, { "clarityId": "claritySample1", "espId": "ESP024255" }, { "clarityId": "claritySample1", "espId": "ESP024256"}],
          "sampleIDsOut": ["claritySample3", "claritySample4", "claritySample5"],
          "files":["http://fileserver.net/path/to/datafile3"]
        }
      ],
      "map": [
        {
          "sampleIDsIn": [{ "clarityId": "claritySample1", "espId": "ESP024258" }, { "clarityId": "claritySample1", "espId": "ESP024259" }, { "clarityId": "claritySample1", "espId": "ESP024260"}],
          "sampleIDsOut": ["claritySample3", "claritySample4", "claritySample5"],
          "files":["http://fileserver.net/path/to/datafile3"]
        }
      ]
    }

我希望将其转换为:

[{"map": [
        {
          "sampleIDsIn": [{ "clarityId": "claritySample1", "espId": "ESP024254" }, { "clarityId": "claritySample1", "espId": "ESP024255" }, { "clarityId": "claritySample1", "espId": "ESP024256"}],
          "sampleIDsOut": ["claritySample3", "claritySample4", "claritySample5"],
          "files":["http://fileserver.net/path/to/datafile3"]
        }
      ]},
{"map":[
        {
          "sampleIDsIn": [{ "clarityId": "claritySample1", "espId": "ESP024258" }, { "clarityId": "claritySample1", "espId": "ESP024259" }, { "clarityId": "claritySample1", "espId": "ESP024260"}],
          "sampleIDsOut": ["claritySample3", "claritySample4", "claritySample5"],
          "files":["http://fileserver.net/path/to/datafile3"]
        }
      ]}]

到目前为止我的代码是:

import json

obj = json.loads(body)
newData = [dct for dct in obj if 'map' in dct]

但这只会返回:

[u'map']

如果我只在身体上使用json.loads,它只返回map的第二个值,覆盖第一个值。

注意:我想要一系列单项dicts;我想要在一个密钥下一起收集值。

有什么想法吗?

1 个答案:

答案 0 :(得分:1)

您可以使用自定义object_pairs_hook函数强制json.loads()返回单项dicts列表,而不是覆盖重复键的单个dict:

import json

def keep_duplicates(ordered_pairs):
    result = []
    for key, value in ordered_pairs:
        result.append({key: value})
    return result

来自docs

  

object_pairs_hook 是一个可选函数,将使用   任何对象文字的结果用对的有序列表解码。   将使用 object_pairs_hook 的返回值代替   dict。此功能可用于实现依赖的自定义解码器   按键和值对的顺序解码(例如,   collections.OrderedDict()将记住插入的顺序)。如果    object_hook 也被定义,object_pairs_hook优先。

用法:

>>> json.loads('{"a": 1, "a": 2, "a": 3}', object_pairs_hook=keep_duplicates)
[{u'a': 1}, {u'a': 2}, {u'a': 3}]

在您的情况下,由于您显然对除"map"键之外的任何内容不感兴趣,您可以在之后过滤结果:

all_data = json.loads(body, object_pairs_hook=keep_duplicates)
map_data = [x for x in all_data if 'map' in x]

...它将为您提供问题中指定的结果。