Question

我有一个标准的嵌套json文件，如下所示：它们是多层嵌套的，我必须通过创建新对象来消除所有嵌套。

嵌套的json文件。

Paragraph

需要创建的新对象

{
"persons": [{
    "id": "f4d322fa8f552",
    "address": {
        "building": "710",
        "coord": "[123, 465]",
        "street": "Avenue Road",
        "zipcode": "12345"
    },
    "cuisine": "Chinese",
    "grades": [{
        "date": "2013-03-03T00:00:00.000Z",
        "grade": "B",
        "score": {
          "x": 3,
          "y": 2
        }
    }, {
        "date": "2012-11-23T00:00:00.000Z",
        "grade": "C",
        "score": {
          "x": 1,
          "y": 22
        }
    }],
    "name": "Shash"
}]
}

我的方法：我使用规范化功能将所有列表变成字典。添加了另一个函数，该函数可以将persons [ { "id": "f4d322fa8f552", "cuisine": "Chinese", "name": "Shash" } ] persons_address [ { "id": "f4d322fa8f552", "building": "710", "coord": "[123, 465]", "street": "Avenue Road", "zipcode": "12345" } ] persons_grade [ { "id": "f4d322fa8f552", "__index": "0", "date": "2013-03-03T00:00:00.000Z", "grade": "B" }, { "id": "f4d322fa8f552", "__index": "1", "date": "2012-11-23T00:00:00.000Z", "grade": "C" }, ] persons_grade_score [ { "id": "f4d322fa8f552", "__index": "0", "x": "3", "y": "2" }, { "id": "f4d322fa8f552", "__index": "1", "x": "1", "y": "22" }, ]添加到所有嵌套字典中。

现在，我无法遍历每个级别并创建新对象。有没有办法做到这一点。

创建新对象后，我们可以将其加载到数据库中。

Answer 1

概念

这是一个通用的解决方案，可以满足您的需求。它使用的概念是递归遍历顶级“人员”字典的所有值。根据找到的每个值的类型继续进行。

因此，对于在每个词典中找到的所有非字典/非列表，它将它们放入所需的顶级对象中。

或者，如果找到字典或列表，它将再次递归执行相同的操作，找到更多的非字典/非列表或列表或字典。

还使用collections.defaultdict让我们轻松地将每个键的未知数量的列表填充到字典中，以便我们可以获取所需的这4个顶级对象。

代码示例

from collections import defaultdict

class DictFlattener(object):
def __init__(self, object_id_key, object_name):
    """Constructor.

    :param object_id_key: String key that identifies each base object
    :param object_name: String name given to the base object in data.

    """
    self._object_id_key = object_id_key
    self._object_name = object_name

    # Store each of the top-level results lists.
    self._collected_results = None

def parse(self, data):
    """Parse the given nested dictionary data into separate lists.

    Each nested dictionary is transformed into its own list of objects,
    associated with the original object via the object id.

    :param data: Dictionary of data to parse.

    :returns: Single dictionary containing the resulting lists of
        objects, where each key is the object name combined with the
        list name via an underscore.

    """

    self._collected_results = defaultdict(list)

    for value_to_parse in data[self._object_name]:
        object_id = value_to_parse[self._object_id_key]
        parsed_object = {}

        for key, value in value_to_parse.items():
            sub_object_name = self._object_name + "_" + key
            parsed_value = self._parse_value(
                value,
                object_id,
                sub_object_name,
            )
            if parsed_value:
                parsed_object[key] = parsed_value

        self._collected_results[self._object_name].append(parsed_object)

    return self._collected_results

def _parse_value(self, value_to_parse, object_id, current_object_name, index=None):
    """Parse some value of an unknown type.

    If it's a list or a dict, keep parsing, otherwise return it as-is.

    :param value_to_parse: Value to parse
    :param object_id: String id of the current top object being parsed.
    :param current_object_name: Name of the current level being parsed.

    :returns: None if value_to_parse is a dict or a list, otherwise returns
        value_to_parse.

    """
    if isinstance(value_to_parse, dict):
        self._parse_dict(
            value_to_parse,
            object_id,
            current_object_name,
            index=index,
        )
    elif isinstance(value_to_parse, list):
        self._parse_list(
            value_to_parse,
            object_id,
            current_object_name,
        )
    else:
        return value_to_parse

def _parse_dict(self, dict_to_parse, object_id, current_object_name,
                index=None):
    """Parse some value of a dict type and store it in self._collected_results.

    :param dict_to_parse: Dict to parse
    :param object_id: String id of the current top object being parsed.
    :param current_object_name: Name of the current level being parsed.

    """
    parsed_dict = {
        self._object_id_key: object_id,
    }
    if index is not None:
        parsed_dict["__index"] = index

    for key, value in dict_to_parse.items():
        sub_object_name = current_object_name + "_" + key
        parsed_value = self._parse_value(
            value,
            object_id,
            sub_object_name,
            index=index,
        )
        if parsed_value:
            parsed_dict[key] = value

    self._collected_results[current_object_name].append(parsed_dict)

def _parse_list(self, list_to_parse, object_id, current_object_name):
    """Parse some value of a list type and store it in self._collected_results.

    :param list_to_parse: Dict to parse
    :param object_id: String id of the current top object being parsed.
    :param current_object_name: Name of the current level being parsed.

    """
    for index, sub_dict in enumerate(list_to_parse):
        self._parse_value(
            sub_dict,
            object_id,
            current_object_name,
            index=index,
        )

然后使用它：

parser = DictFlattener("id", "persons")
results = parser.parse(test_data)

注释

您的示例数据与预期数据之间存在一些不一致，例如分数是字符串还是整数。因此，在与预期相比时，您需要进行调整。
人们总是可以进行更多的重构，或者可以使其更具功能而不是成为一类。但是希望看看这个可以帮助您了解如何做。
正如@jbernardo所说，如果要将它们插入到关系数据库中，它们不应该都只以“ id”作为键，而应该是“ person_id”。

Answer 2

以下是伪代码，可以在解析Parsing values from a JSON file?这样的json文件之后为您提供帮助

top_level = []
for key, val in data['persons']:
    if not (isinstance(val, dict) or isinstance(val, list)):
        top_level.append(key)

all_second_level = []
for key, val in data['persons']:
    if isinstance(val, dict):
        second_level = []
        for key1, val1 in data['persons']['key']:
            second_level.append(key)
        all_second_level.append(second_level)
    elif isinstance(val, list):
        second_level = []
        for index, item in enumerate(list):
            second_level_entity = []
            for key1, val1 in item:
                if not isinstance(val1, dict):
                    second_level_entity.append(key1)
                else:
                    # append it to third level entity
            # append index to the second_level_entity
            second_level.append(second_level_entity)
        all_second_level.append(second_level)

# in the end append id to all items of entities at each level

Answer 3

# create 4 empty lists
persons = []
persons_address = []
persons_grade = []
persons_grade_score = []


# go through all your data and put the correct information in each list
for data in yourdict['persons']:
    persons.append({
        'id': data['id'],
        'cuisine': data['cuisine'],
        'name': data['name'],
    })

    _address = data['address'].copy()
    _address['id'] = data['id']
    persons_address.append(_address)

    persons_grade.extend({
        'id': data['id'].
        '__index': n,
        'date': g['date'],
        'grade': g['grade'],
    } for n, g in enumerate(data['grades']))

    persons_grade_score.extend({
        'id': data['id'].
        '__index': n,
        'x': g['x'],
        'y': g['y']
    } for n, g in enumerate(data['grades']))

通过从JSON创建新对象来消除嵌套

3 个答案:

概念

代码示例

注释