我有2种类型的数据结构
data = {'name':class_1_name, 'type':'directory', 'children': [{'name':class_2_name, 'type':'directory', 'children': [{'name':class_3_name, 'type':'directory', 'children': []}]}]}
data = {'name':class_1_name, 'type':'directory', 'children': [{'name':class_2_name, 'type':'directory', 'children': []}]}
现在,将这些字典的多个版本合并成一个循环时,我的问题就出现了。因为孩子们总是不同的,所以我所有的尝试都只在合并了一层字典的情况下返回。例如:
{
"name": "class_1_1",
"type": "directory",
"children": [
{
"name": "class_2_1",
"type": "directory",
"children": []
},
{
"name": "class_2_2",
"type": "directory",
"children": [
{
"name": "class_3_1",
"type": "directory",
"children": []
}
]
},
{
"name": "class_2_2",
"type": "directory",
"children": [
{
"name": "class_3_2",
"type": "directory",
"children": []
}
]
}
]
}
结果应为:
{
"name": "class_1_1",
"type": "directory",
"children": [
{
"name": "class_2_1",
"type": "directory",
"children": []
},
{
"name": "class_2_2",
"type": "directory",
"children": [
{
"name": "class_3_1",
"type": "directory",
"children": []
},
{
"name": "class_3_2",
"type": "directory",
"children": []
}
]
}
]
}
我当前正在使用https://github.com/avian2/jsonmerge中的avian2的jsonmerge 因为我真的不知道从哪里开始按值对两个字典进行深度合并。
每次我尝试解决这个问题时,我都会遇到逻辑错误。我真的不知道该如何处理。任何帮助我指出正确方向的帮助/提示,将不胜感激。
干杯。
修改代码:
import os
import io
import json
import bs4 as bs
from jsonmerge import Merger
list = [ '' ]
g_dict = {}
def getJsonInfo( eggs ):
if (eggs == 3):
data = {'name':class_1_name, 'type':'directory', 'children': [{'name':class_2_name, 'type':'directory', 'children': [{'name':class_3_name, 'type':'directory', 'children': []}]}]}
else:
data = {'name':class_1_name, 'type':'directory', 'children': [{'name':class_2_name, 'type':'directory', 'children': []}]}
schema = {
"properties": {
"children": {
"type": "array",
"mergeStrategy": "append"
}
}
}
global g_dict
merger = Merger(schema)
g_dict = merger.merge(data, g_dict)
with open('catalogue.html') as html_file:
tree = bs.BeautifulSoup( html_file,'lxml' )
for class_1 in tree.find_all('div',class_="class_1"):
class_1_name = class_1['name']
for class_2 in class_1.find_all('div',class_="class_2"):
class_2_name = class_2['name']
class_3 = class_2.find_all('div',class_="class_3")
if len(class_3) != 0:
for class_3 in class_2.find_all('div',class_="class_3"):
class_3_name = class_3['name']
print(class_1['name'] + ' -> ' + class_2['name'] + ' -> ' + class_3['name'])
getJsonInfo(3)
else:
print(class_1['name'] + ' -> ' + class_2['name'] )
getJsonInfo(2)
print('Creating JSON Tree')
with io.open('database.json', 'w', encoding='utf-8') as file:
file.write(json.dumps(g_dict, ensure_ascii=False, indent=4))
print('Done!')
catalogue.html:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="ja">
<body>
<body>
<div class="class_1" name="A">
<div class="class_2" name="A2">
<div class="class_3" name="a31"></div>
<div class="class_3" name="a32"></div>
</div>
</div>
<div class="class_1" name="B">
<div class="class_2" name="b1"></div>
</div>
</body>
</html>
答案 0 :(得分:1)
您可以使用字典seen
来跟踪每个不同名称的第一个子字典,并继续将其children
与同名的其他子dict
进行扩展,然后递归遍历放下孩子们的孩子:
def deep_merge(d):
seen = {}
for c in d['children']:
if c['name'] in seen:
seen[c['name']]['children'] += c['children']
else:
seen[c['name']] = c
deep_merge(c)
deep_merge(d)
d
将变为:
{'children': [{'children': [],
'name': 'class_2_1',
'type': 'directory'},
{'children': [{'children': [],
'name': 'class_3_1',
'type': 'directory'},
{'children': [],
'name': 'class_3_2',
'type': 'directory'}],
'name': 'class_2_2',
'type': 'directory'},
{'children': [{'children': [],
'name': 'class_3_2',
'type': 'directory'}],
'name': 'class_2_2',
'type': 'directory'}],
'name': 'class_1_1',
'type': 'directory'}