我正在尝试从nested dict
python
模块中读取的文件中创建自定义without collection
。我的字典数据结构如下。
d = {'employee':
{'developer1':
{'id1':
{'language': ('c', 'java'),
'worked_area':('delhi', 'kolkata')
},
'id2':
{'language':('python' , 'c++'),
'worked_area':('kolkata')
}
},
'devloper2':
{'id1':
{'language': ('c', 'java'),
'worked_area':('delhi', 'kolkata')
}
}
}
}
使用以下代码读取数据结构:
for k1, v1 in d.items():
for k2, v2 in v1.items():
for k3, v3 in v2.items():
for k4, v5 in v3.items():
print(k1, k2, k3, k4, v5)
文件: 的 text1.txt
employee developer1 id1 language c
employee developer1 id1 language java
employee developer1 id1 worked_area delhi
employee developer1 id1 worked_area kolkata
employee developer1 id2 language python
employee developer1 id2 language c++
employee developer1 id2 worked_area kolkata
employee devloper2 id1 language c
employee devloper2 id1 language java
employee devloper2 id1 worked_area delhi
employee devloper2 id1 worked_area kolkata
现在我尝试从上面的文本文件创建上面的字典数据结构,并使用上面的代码打印其内容。
import re
d = {}
fh = open('text1.txt', 'r')
for i, line in enumerate(fh):
line = line.strip()
tmp = re.split(r'\t+', line)
d[tmp[0]][tmp[1]][tmp[2]][tmp[3]].append(tmp[4])
但是我在运行代码时遇到了波纹管错误
错误
KeyError: 'employee'
因此需要帮助来创建数据结构代码。
答案 0 :(得分:1)
你的问题是你初始化一个空字典。没有employee
密钥,因此您获得KeyError
:
>>> d = {}
>>> d['employee']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'employee'
下一个问题是与employee
键对应的值本身应该是dict,依此类推。要解决此问题,您可以使用嵌套的defaultdict
。
由于嵌套深度是常量且已知,因此您只需要初始化树。它是列表defaultdict的defaultdict的defaultdict的默认用户:)
初始化此树后,很容易将信息附加到树叶上。请注意,您应该使用列表而不是元组:languages
的长度直到最后才知道,并且您无法将值附加到元组。
data = """employee developer1 id1 language c
employee developer1 id1 language java
employee developer1 id1 worked_area delhi
employee developer1 id1 worked_area kolkata
employee developer1 id2 language python
employee developer1 id2 language c++
employee developer1 id2 worked_area kolkata
employee devloper2 id1 language c
employee devloper2 id1 language java
employee devloper2 id1 worked_area delhi
employee devloper2 id1 worked_area kolkata"""
from collections import defaultdict
tree = defaultdict(lambda: defaultdict(lambda: defaultdict(lambda: defaultdict(list))))
for line in data.splitlines():
k1, k2, k3, k4, v = line.split()
tree[k1][k2][k3][k4].append(v)
print(tree)
# defaultdict(<function <lambda> at 0x7f2e771cd7d0>, {'employee': defaultdict(<function <lambda> at 0x7f2e771cdf50>, {'developer1': defaultdict(<function <lambda> at 0x7f2e771cf050>, {'id2': defaultdict(<type 'list'>, {'worked_area': ['kolkata'], 'language': ['python', 'c++']}), 'id1': defaultdict(<type 'list'>, {'worked_area': ['delhi', 'kolkata'], 'language': ['c', 'java']})}), 'devloper2': defaultdict(<function <lambda> at 0x7f2e771cf0c8>, {'id1': defaultdict(<type 'list'>, {'worked_area': ['delhi', 'kolkata'], 'language': ['c', 'java']})})})})
print(tree['employee']['developer1']['id2']['language'])
# ['python', 'c++']
print(tree['employee']['developerX']['idX']['language'])
# []
要查看树的结构,您可以使用json.dumps
:
import json
print(json.dumps(tree, indent=4))
输出:
{
"employee": {
"developer1": {
"id1": {
"language": [
"c",
"java"
],
"worked_area": [
"delhi",
"kolkata"
]
},
"id2": {
"language": [
"python",
"c++"
],
"worked_area": [
"kolkata"
]
}
},
"devloper2": {
"id1": {
"language": [
"c",
"java"
],
"worked_area": [
"delhi",
"kolkata"
]
}
}
}
}
由于defaultdict
也是一个字典,你可以像你提议的那样迭代值。
答案 1 :(得分:0)
根据要求:
只需使用内置的dict
即可:
import re
d = {}
fh = open('text1.txt', 'r')
for i, line in enumerate(fh):
line = line.strip()
tmp = re.split(r'\t+', line)
if tmp[0] not in d:
d[tmp[0]] = {}
if tmp[1] not in d[tmp[0]]:
d[tmp[0]][tmp[1]] = {}
if tmp[2] not in d[tmp[0]][tmp[1]]:
d[tmp[0]][tmp[1]][tmp[2]] = {}
if tmp[3] not in d[tmp[0]][tmp[1]][tmp[2]]:
d[tmp[0]][tmp[1]][tmp[2]][tmp[3]] = []
d[tmp[0]][tmp[1]][tmp[2]][tmp[3]].append(tmp[4])
有了更多想法,可能会有一个更优雅的解决方案。人们之前必须考虑过这一点。例如,使用JSON文件的人。