将嵌套的dict转换为仅包含具有唯一键集的列表项的dict

时间:2016-05-06 19:25:20

标签: python json yaml

如何将以下嵌套字典转换为仅包含具有唯一键集(不管值)的列表项的字典?

我不知道有多少级别的嵌套,我不关心返回哪个列表项,只要它有一组唯一的键(对于项目所属的列表)

(我正在尝试从非常长的YAML文件生成示例文件以用于文档目的)

input = { 
   "mylist": [
      {
         "key1": "1333", 
         "key2": [
                  { 
                   "key2a":134,
                   "key2b":1373
                    },
                  { 
                   "key2a":124,
                   "key2b":136
                    }
                  ]
      },{
         "key1": "875", 
         "key2": [
                  { 
                   "key2a":999,
                   "key2b":6567
                    },
                  { 
                   "key2a":8765,
                   "key2b":875
                    }
                  ]
      },{
         "key1": "6754", 
         "key3": 3232
      },{
         "key1": "34545", 
         "key3": 34554
      }
   ]
 }

需要输出:

{ 
   "mylist": [
      {
         "key1": "1333", 
         "key2": [
                  { "key2a":134,
                   "key2b":1373
                    }
                  ]
      },{
         "key1": "6754", 
         "key3": 3232
      }
   ]
}

我制作了这个(详细)代码,通过获取和存储它在列表项对象中找到的所有键来解决它,但我确信这可以用更短的方式完成吗?

input = collections.OrderedDict(input)
def get_keys(obj,keys=[]):
    if isinstance(obj, (dict,collections.OrderedDict)):
        for k, v in obj.items():
            if not isinstance(v, (dict,collections.OrderedDict)):
                keys.append(k)        
            get_keys(v,keys)
    elif isinstance(obj, list):
        for elem in obj:
            if not isinstance(elem, (dict,collections.OrderedDict,list)):
                keys.append(elem)
            get_keys(elem,keys)
    return keys

def traverse(obj,  callback=None):
    if isinstance(obj, (dict,collections.OrderedDict)):        
        value = {k: traverse(v, callback)
                 for k, v in obj.items()}
    elif isinstance(obj, list):
        value = [traverse(elem,  callback)
                 for elem in obj]
    else:
        value = obj
    if callback is None:
        return value
    else:
        return callback(value)

def traverse_modify(obj):
    def yaml_shortener(obj):
        duplicates = []
        if isinstance(obj,list) and len(obj)>1:
            return_list = []
            for i,elem in enumerate(obj):
                if not any(Counter(get_keys(elem,keys=[])) == Counter(item) for item in duplicates): 
                    return_list.append(elem)
                    duplicates.append(get_keys(elem,keys=[]))       
            return return_list
        else:
            return obj
    return traverse(obj, callback=yaml_shortener)   

def shorten_yaml(obj):
    return traverse_modify(obj)

print json.dumps(shorten_yaml(input),indent=3)

1 个答案:

答案 0 :(得分:0)

首先,我假设与字典键关联的值是a 字典列表或标量值。该值由函数处理 convert_value(),其中:

  • 调用函数process_dict_list()以在值为字典列表的情况下提取唯一的键集;
  • 如果标量,则保持值不变。

magic 发生在函数convert_dict_list()中,其中:

  • 收集传递的词典列表的每个词典的所有键(和值);
  • 对它们进行排序并创建一个临时字典,将每个键元组映射到相应的转换值列表中;这样的字典包含 - 当然 - 唯一的键元组,完全符合要求;
  • 最后,键和值列表的元组将转换回预期的词典。

函数convert()是一个接受带有输入数据的字典的简单接口函数:它只是将字典转换为字典列表(带有单个项目)并调用前一个函数来处理它。

以下是完整代码:

def convert(a_dict):
    return convert_dict_list([a_dict])[0]

def convert_dict_list(dict_list):
    sorted_items = (zip(*sorted(d.iteritems())) for d in dict_list)
    tmp_dict = dict((tuple(keys), map(convert_value, values))
        for (keys, values) in sorted_items)
    return map(dict, [zip(k, v) for (k, v) in tmp_dict.iteritems()])

def convert_value(val):
    return convert_dict_list(val) if isinstance(val, list) else val

这是使用示例输入生成的输出:

>>> print convert(input)
{'mylist': [{'key3': 34554, 'key1': '34545'}, {'key2': [{'key2b': 875, 'key2a': 8765}], 'key1': '875'}]}