python获取json键作为完整路径

时间:2018-07-23 22:58:33

标签: python

我想解析一个JSON文件并获得完整的列表,其中包含访问密钥的所有必需路径。如果使用keys方法,则将获得单个键的列表,而不是访问数据所需的层次结构键的完整列表。

所以,如果给出这样的数据

data = {
    "glossary": {
        "title": "example glossary",
        "GlossDiv": {
            "title": "S",
            "GlossList": {
                "GlossEntry": {
                    "ID": "SGML",
                    "SortAs": "SGML",
                    "GlossTerm": "Standard Generalized Markup Language",
                    "Acronym": "SGML",
                    "Abbrev": "ISO 8879:1986",
                    "GlossDef": {
                        "para": "A meta-markup language, used to create markup languages such as DocBook.",
                        "GlossSeeAlso": ["GML", "XML"]
                    },
                    "GlossSee": "markup"
                }
            }
        }
    }
}

我可以返回如下列表,其中包含所有键的完整路径。

[['glossary']['title'],['glossary']['GlossDiv']...]

读取和访问元素很好。为了获得结果,我尝试使用此答案Access nested dictionary items via a list of keys

我不太了解它是如何工作的,它只返回单词“ glossary”。

这是我的代码。我使用ChainMap是因为它可以更轻松地将json转换为字典并轻松访问键。

import json
from collections import ChainMap
from functools import reduce
import operator

myDataChained = ChainMap(data)

def getFromDict(data):
    return reduce(operator.getitem, data)

Json_Paths = getFromDict(myDataChained)
print(Json_Paths)

1 个答案:

答案 0 :(得分:2)

您不能使用与链接答案中相同的技术来进行反向操作-您没有预先经过functools.reduce()/operator.getitem()组合的路径信息-您正尝试获取该信息,即规范化/展平字典结构。

为此,您必须遍历整个结构并收集数据中的所有可能路径,例如:

import collections

def get_paths(source):
    paths = []
    if isinstance(source, collections.MutableMapping):  # found a dict-like structure...
        for k, v in source.items():  # iterate over it; Python 2.x: source.iteritems()
            paths.append([k])  # add the current child path
            paths += [[k] + x for x in get_paths(v)]  # get sub-paths, extend with the current
    # else, check if a list-like structure, remove if you don't want list paths included
    elif isinstance(source, collections.Sequence) and not isinstance(source, str):
        #                          Python 2.x: use basestring instead of str ^
        for i, v in enumerate(source):
            paths.append([i])
            paths += [[i] + x for x in get_paths(v)]  # get sub-paths, extend with the current
    return paths

现在,如果您通过data运行它:

data = {
    "glossary": {
        "title": "example glossary",
        "GlossDiv": {
            "title": "S",
            "GlossList": {
                "GlossEntry": {
                    "ID": "SGML",
                    "SortAs": "SGML",
                    "GlossTerm": "Standard Generalized Markup Language",
                    "Acronym": "SGML",
                    "Abbrev": "ISO 8879:1986",
                    "GlossDef": {
                        "para": "A meta-markup language, used to create markup languages...",
                        "GlossSeeAlso": ["GML", "XML"]
                    },
                    "GlossSee": "markup"
                }
            }
        }
    }
}

paths = get_paths(data)

您将获得paths包含

[['glossary'],
 ['glossary', 'title'],
 ['glossary', 'GlossDiv'],
 ['glossary', 'GlossDiv', 'title'],
 ['glossary', 'GlossDiv', 'GlossList'],
 ['glossary', 'GlossDiv', 'GlossList', 'GlossEntry'],
 ['glossary', 'GlossDiv', 'GlossList', 'GlossEntry', 'ID'],
 ['glossary', 'GlossDiv', 'GlossList', 'GlossEntry', 'SortAs'],
 ['glossary', 'GlossDiv', 'GlossList', 'GlossEntry', 'GlossTerm'],
 ['glossary', 'GlossDiv', 'GlossList', 'GlossEntry', 'Acronym'],
 ['glossary', 'GlossDiv', 'GlossList', 'GlossEntry', 'Abbrev'],
 ['glossary', 'GlossDiv', 'GlossList', 'GlossEntry', 'GlossDef'],
 ['glossary', 'GlossDiv', 'GlossList', 'GlossEntry', 'GlossDef', 'para'],
 ['glossary', 'GlossDiv', 'GlossList', 'GlossEntry', 'GlossDef', 'GlossSeeAlso'],
 ['glossary', 'GlossDiv', 'GlossList', 'GlossEntry', 'GlossDef', 'GlossSeeAlso', 0],
 ['glossary', 'GlossDiv', 'GlossList', 'GlossEntry', 'GlossDef', 'GlossSeeAlso', 1],
 ['glossary', 'GlossDiv', 'GlossList', 'GlossEntry', 'GlossSee']]

您可以将其中的任何一个输入到该functools.reduce()/operator.getitem()组合中以获取目标值。