在嵌套的python词典和列表中查找所有出现的键

时间:2012-03-21 15:26:09

标签: python recursion dictionary traversal

我有这样的字典:

{ "id" : "abcde",
  "key1" : "blah",
  "key2" : "blah blah",
  "nestedlist" : [ 
    { "id" : "qwerty",
      "nestednestedlist" : [ 
        { "id" : "xyz",
          "keyA" : "blah blah blah" },
        { "id" : "fghi",
          "keyZ" : "blah blah blah" }],
      "anothernestednestedlist" : [ 
        { "id" : "asdf",
          "keyQ" : "blah blah" },
        { "id" : "yuiop",
          "keyW" : "blah" }] } ] } 

基本上是具有任意深度的嵌套列表,字典和字符串的字典。

遍历此方法以提取每个“id”键的值的最佳方法是什么?我想实现相当于XPath查询,如“// id”。 “id”的值始终是一个字符串。

所以从我的例子来看,我需要的输出基本上是:

["abcde", "qwerty", "xyz", "fghi", "asdf", "yuiop"]

订单并不重要。

11 个答案:

答案 0 :(得分:46)

我发现这个Q / A非常有趣,因为它为同一个问题提供了几种不同的解决方案。我使用了所有这些函数并使用复杂的字典对象对其进行了测试。我必须从测试中取出两个函数,因为它们必须有许多失败结果,并且它们不支持返回列表或dicts作为值,这是我认为必不可少的,因为函数应该为几乎任何即将到来的数据。

所以我通过timeit模块在​​100.000次迭代中抽取其他函数,输出结果如下:

0.11 usec/pass on gen_dict_extract(k,o)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
6.03 usec/pass on find_all_items(k,o)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
0.15 usec/pass on findkeys(k,o)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
1.79 usec/pass on get_recursively(k,o)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
0.14 usec/pass on find(k,o)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
0.36 usec/pass on dict_extract(k,o)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -

所有函数都有相同的针搜索('logging')和相同的字典对象,其构造如下:

o = { 'temparature': '50', 
      'logging': {
        'handlers': {
          'console': {
            'formatter': 'simple', 
            'class': 'logging.StreamHandler', 
            'stream': 'ext://sys.stdout', 
            'level': 'DEBUG'
          }
        },
        'loggers': {
          'simpleExample': {
            'handlers': ['console'], 
            'propagate': 'no', 
            'level': 'INFO'
          },
         'root': {
           'handlers': ['console'], 
           'level': 'DEBUG'
         }
       }, 
       'version': '1', 
       'formatters': {
         'simple': {
           'datefmt': "'%Y-%m-%d %H:%M:%S'", 
           'format': '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
         }
       }
     }, 
     'treatment': {'second': 5, 'last': 4, 'first': 4},   
     'treatment_plan': [[4, 5, 4], [4, 5, 4], [5, 5, 5]]
}

所有功能都提供了相同的结果,但时差很大!函数gen_dict_extract(k,o)是我从这里的函数改编的函数,实际上它非常类似于Alfe的find函数,主要区别在于,我正在检查给定对象是否具有iteritems函数,在递归期间传递case字符串:

def gen_dict_extract(key, var):
    if hasattr(var,'iteritems'):
        for k, v in var.iteritems():
            if k == key:
                yield v
            if isinstance(v, dict):
                for result in gen_dict_extract(key, v):
                    yield result
            elif isinstance(v, list):
                for d in v:
                    for result in gen_dict_extract(key, d):
                        yield result

所以这个变体是这里最快和最安全的功能。并且find_all_items非常缓慢,远远低于第二慢get_recursivley,而其他dict_extract除了fun之外,彼此接近。函数keyHole和{{1}}仅在您查找字符串时才有效。

这里有趣的学习方面:)

答案 1 :(得分:38)

d = { "id" : "abcde",
    "key1" : "blah",
    "key2" : "blah blah",
    "nestedlist" : [ 
    { "id" : "qwerty",
        "nestednestedlist" : [ 
        { "id" : "xyz", "keyA" : "blah blah blah" },
        { "id" : "fghi", "keyZ" : "blah blah blah" }],
        "anothernestednestedlist" : [ 
        { "id" : "asdf", "keyQ" : "blah blah" },
        { "id" : "yuiop", "keyW" : "blah" }] } ] } 


def fun(d):
    if 'id' in d:
        yield d['id']
    for k in d:
        if isinstance(d[k], list):
            for i in d[k]:
                for j in fun(i):
                    yield j

>>> list(fun(d))
['abcde', 'qwerty', 'xyz', 'fghi', 'asdf', 'yuiop']

答案 2 :(得分:13)

def find(key, value):
  for k, v in value.iteritems():
    if k == key:
      yield v
    elif isinstance(v, dict):
      for result in find(key, v):
        yield result
    elif isinstance(v, list):
      for d in v:
        for result in find(key, d):
          yield result

编辑:@Anthon注意到这不适用于直接嵌套列表。如果您在输入中有此内容,则可以使用:

def find(key, value):
  for k, v in (value.iteritems() if isinstance(value, dict) else
               enumerate(value) if isinstance(value, list) else []):
    if k == key:
      yield v
    elif isinstance(v, (dict, list)):
      for result in find(key, v):
        yield result

但我认为原始版本更容易理解,所以我会离开它。

答案 3 :(得分:6)

d = { "id" : "abcde",
    "key1" : "blah",
    "key2" : "blah blah",
    "nestedlist" : [
    { "id" : "qwerty",
        "nestednestedlist" : [
        { "id" : "xyz", "keyA" : "blah blah blah" },
        { "id" : "fghi", "keyZ" : "blah blah blah" }],
        "anothernestednestedlist" : [
        { "id" : "asdf", "keyQ" : "blah blah" },
        { "id" : "yuiop", "keyW" : "blah" }] } ] }


def findkeys(node, kv):
    if isinstance(node, list):
        for i in node:
            for x in findkeys(i, kv):
               yield x
    elif isinstance(node, dict):
        if kv in node:
            yield node[kv]
        for j in node.values():
            for x in findkeys(j, kv):
                yield x

print list(findkeys(d, 'id'))

答案 4 :(得分:5)

我只想使用yield from并接受顶级列表来迭代@ hexerei-software的优秀答案。

def gen_dict_extract(var, key):
    if isinstance(var, dict):
        for k, v in var.items():
            if k == key:
                yield v
            if isinstance(v, (dict, list)):
                yield from gen_dict_extract(v, key)
    elif isinstance(var, list):
        for d in var:
            yield from gen_dict_extract(d, key)

答案 5 :(得分:4)

我是这样做的。

此函数以递归方式搜索包含嵌套字典和列表的字典。它构建一个名为fields_found的列表,其中包含每次找到该字段时的值。 “字段”是我在字典及其嵌套列表和词典中寻找的关键。

def get_recursively(search_dict, field):
    """Takes a dict with nested lists and dicts,
    and searches all dicts for a key of the field
    provided.
    """
    fields_found = []

    for key, value in search_dict.iteritems():

        if key == field:
            fields_found.append(value)

        elif isinstance(value, dict):
            results = get_recursively(value, field)
            for result in results:
                fields_found.append(result)

        elif isinstance(value, list):
            for item in value:
                if isinstance(item, dict):
                    more_results = get_recursively(item, field)
                    for another_result in more_results:
                        fields_found.append(another_result)

    return fields_found

答案 6 :(得分:4)

另一种变体,包括找到结果的嵌套路径( 注意:此版本不考虑列表 ):

def find_all_items(obj, key, keys=None):
    """
    Example of use:
    d = {'a': 1, 'b': 2, 'c': {'a': 3, 'd': 4, 'e': {'a': 9, 'b': 3}, 'j': {'c': 4}}}
    for k, v in find_all_items(d, 'a'):
        print "* {} = {} *".format('->'.join(k), v)    
    """
    ret = []
    if not keys:
        keys = []
    if key in obj:
        out_keys = keys + [key]
        ret.append((out_keys, obj[key]))
    for k, v in obj.items():
        if isinstance(v, dict):
            found_items = find_all_items(v, key, keys=(keys+[k]))
            ret += found_items
    return ret

答案 7 :(得分:3)

pip install nested-lookup 完全符合您的要求:

document = [ { 'taco' : 42 } , { 'salsa' : [ { 'burrito' : { 'taco' : 69 } } ] } ]

>>> print(nested_lookup('taco', document))
[42, 69]

答案 8 :(得分:0)

这是我的抨击:

def keyHole(k2b,o):
  # print "Checking for %s in "%k2b,o
  if isinstance(o, dict):
    for k, v in o.iteritems():
      if k == k2b and not hasattr(v, '__iter__'): yield v
      else:
        for r in  keyHole(k2b,v): yield r
  elif hasattr(o, '__iter__'):
    for r in [ keyHole(k2b,i) for i in o ]:
      for r2 in r: yield r2
  return

Ex.:

>>> findMe = {'Me':{'a':2,'Me':'bop'},'z':{'Me':4}}
>>> keyHole('Me',findMe)
<generator object keyHole at 0x105eccb90>
>>> [ x for x in keyHole('Me',findMe) ]
['bop', 4]

答案 9 :(得分:0)

紧跟@hexerei软件的答案,然后 @ bruno-bronosky的评论,如果要遍历键的列表/集:

def gen_dict_extract(var, keys):
   for key in keys:
      if hasattr(var, 'items'):
         for k, v in var.items():
            if k == key:
               yield v
            if isinstance(v, dict):
               for result in gen_dict_extract([key], v):
                  yield result
            elif isinstance(v, list):
               for d in v:
                  for result in gen_dict_extract([key], d):
                     yield result    

请注意,我传递的列表中只有一个元素([key]},而不是字符串key。

答案 10 :(得分:0)

我无法让这里发布的解决方案开箱即用,所以我想写一些更灵活的东西。

下面的递归函数应该允许您在任意深度的嵌套字典和列表集中收集满足给定键的某种正则表达式模式的所有值。

import re

def search(dictionary, search_pattern, output=None):
    """
    Search nested dictionaries and lists using a regex search
    pattern to match a key and return the corresponding value(s).
    """
    if output is None:
        output = []

    pattern = re.compile(search_pattern)
    
    for k, v in dictionary.items():
        pattern_found = pattern.search(k)
        if not pattern_found:
            if isinstance(v, list):
                for item in v:
                    if isinstance(item, dict):
                        search(item, search_pattern, output)
            if isinstance(v, dict):
                search(v, search_pattern, output)
        else:
            if pattern_found:
                output.append(v)

    return output

如果您想搜索特定字词,您可以随时将搜索模式设为类似 r'\bsome_term\b'