我有一个标准的嵌套json文件,如下所示:它们是多层嵌套的,我必须通过创建新对象来消除所有嵌套。
嵌套的json文件。
Paragraph
需要创建的新对象
{
"persons": [{
"id": "f4d322fa8f552",
"address": {
"building": "710",
"coord": "[123, 465]",
"street": "Avenue Road",
"zipcode": "12345"
},
"cuisine": "Chinese",
"grades": [{
"date": "2013-03-03T00:00:00.000Z",
"grade": "B",
"score": {
"x": 3,
"y": 2
}
}, {
"date": "2012-11-23T00:00:00.000Z",
"grade": "C",
"score": {
"x": 1,
"y": 22
}
}],
"name": "Shash"
}]
}
我的方法:我使用规范化功能将所有列表变成字典。添加了另一个函数,该函数可以将persons
[
{
"id": "f4d322fa8f552",
"cuisine": "Chinese",
"name": "Shash"
}
]
persons_address
[
{
"id": "f4d322fa8f552",
"building": "710",
"coord": "[123, 465]",
"street": "Avenue Road",
"zipcode": "12345"
}
]
persons_grade
[
{
"id": "f4d322fa8f552",
"__index": "0",
"date": "2013-03-03T00:00:00.000Z",
"grade": "B"
},
{
"id": "f4d322fa8f552",
"__index": "1",
"date": "2012-11-23T00:00:00.000Z",
"grade": "C"
},
]
persons_grade_score
[
{
"id": "f4d322fa8f552",
"__index": "0",
"x": "3",
"y": "2"
},
{
"id": "f4d322fa8f552",
"__index": "1",
"x": "1",
"y": "22"
},
]
添加到所有嵌套字典中。
现在,我无法遍历每个级别并创建新对象。有没有办法做到这一点。
创建新对象后,我们可以将其加载到数据库中。
答案 0 :(得分:6)
这是一个通用的解决方案,可以满足您的需求。它使用的概念是递归遍历顶级“人员”字典的所有值。根据找到的每个值的类型继续进行。
因此,对于在每个词典中找到的所有非字典/非列表,它将它们放入所需的顶级对象中。
或者,如果找到字典或列表,它将再次递归执行相同的操作,找到更多的非字典/非列表或列表或字典。
还使用collections.defaultdict让我们轻松地将每个键的未知数量的列表填充到字典中,以便我们可以获取所需的这4个顶级对象。
from collections import defaultdict
class DictFlattener(object):
def __init__(self, object_id_key, object_name):
"""Constructor.
:param object_id_key: String key that identifies each base object
:param object_name: String name given to the base object in data.
"""
self._object_id_key = object_id_key
self._object_name = object_name
# Store each of the top-level results lists.
self._collected_results = None
def parse(self, data):
"""Parse the given nested dictionary data into separate lists.
Each nested dictionary is transformed into its own list of objects,
associated with the original object via the object id.
:param data: Dictionary of data to parse.
:returns: Single dictionary containing the resulting lists of
objects, where each key is the object name combined with the
list name via an underscore.
"""
self._collected_results = defaultdict(list)
for value_to_parse in data[self._object_name]:
object_id = value_to_parse[self._object_id_key]
parsed_object = {}
for key, value in value_to_parse.items():
sub_object_name = self._object_name + "_" + key
parsed_value = self._parse_value(
value,
object_id,
sub_object_name,
)
if parsed_value:
parsed_object[key] = parsed_value
self._collected_results[self._object_name].append(parsed_object)
return self._collected_results
def _parse_value(self, value_to_parse, object_id, current_object_name, index=None):
"""Parse some value of an unknown type.
If it's a list or a dict, keep parsing, otherwise return it as-is.
:param value_to_parse: Value to parse
:param object_id: String id of the current top object being parsed.
:param current_object_name: Name of the current level being parsed.
:returns: None if value_to_parse is a dict or a list, otherwise returns
value_to_parse.
"""
if isinstance(value_to_parse, dict):
self._parse_dict(
value_to_parse,
object_id,
current_object_name,
index=index,
)
elif isinstance(value_to_parse, list):
self._parse_list(
value_to_parse,
object_id,
current_object_name,
)
else:
return value_to_parse
def _parse_dict(self, dict_to_parse, object_id, current_object_name,
index=None):
"""Parse some value of a dict type and store it in self._collected_results.
:param dict_to_parse: Dict to parse
:param object_id: String id of the current top object being parsed.
:param current_object_name: Name of the current level being parsed.
"""
parsed_dict = {
self._object_id_key: object_id,
}
if index is not None:
parsed_dict["__index"] = index
for key, value in dict_to_parse.items():
sub_object_name = current_object_name + "_" + key
parsed_value = self._parse_value(
value,
object_id,
sub_object_name,
index=index,
)
if parsed_value:
parsed_dict[key] = value
self._collected_results[current_object_name].append(parsed_dict)
def _parse_list(self, list_to_parse, object_id, current_object_name):
"""Parse some value of a list type and store it in self._collected_results.
:param list_to_parse: Dict to parse
:param object_id: String id of the current top object being parsed.
:param current_object_name: Name of the current level being parsed.
"""
for index, sub_dict in enumerate(list_to_parse):
self._parse_value(
sub_dict,
object_id,
current_object_name,
index=index,
)
然后使用它:
parser = DictFlattener("id", "persons")
results = parser.parse(test_data)
答案 1 :(得分:3)
以下是伪代码,可以在解析Parsing values from a JSON file?这样的json
文件之后为您提供帮助
top_level = []
for key, val in data['persons']:
if not (isinstance(val, dict) or isinstance(val, list)):
top_level.append(key)
all_second_level = []
for key, val in data['persons']:
if isinstance(val, dict):
second_level = []
for key1, val1 in data['persons']['key']:
second_level.append(key)
all_second_level.append(second_level)
elif isinstance(val, list):
second_level = []
for index, item in enumerate(list):
second_level_entity = []
for key1, val1 in item:
if not isinstance(val1, dict):
second_level_entity.append(key1)
else:
# append it to third level entity
# append index to the second_level_entity
second_level.append(second_level_entity)
all_second_level.append(second_level)
# in the end append id to all items of entities at each level
答案 2 :(得分:2)
# create 4 empty lists
persons = []
persons_address = []
persons_grade = []
persons_grade_score = []
# go through all your data and put the correct information in each list
for data in yourdict['persons']:
persons.append({
'id': data['id'],
'cuisine': data['cuisine'],
'name': data['name'],
})
_address = data['address'].copy()
_address['id'] = data['id']
persons_address.append(_address)
persons_grade.extend({
'id': data['id'].
'__index': n,
'date': g['date'],
'grade': g['grade'],
} for n, g in enumerate(data['grades']))
persons_grade_score.extend({
'id': data['id'].
'__index': n,
'x': g['x'],
'y': g['y']
} for n, g in enumerate(data['grades']))