我有一个数据示例列表:
<a href="jeans1-big.jpg" class="MagicZoom" id="jeans"><img src="jeans1-small.jpg" /></a>
<a data-zoom-id="jeans" href="jeans1-big.jpg" data-image="jeans1-small.jpg"><img src="jeans1-tiny.jpg" /></a>
<a data-zoom-id="jeans" href="jeans2-big.jpg" data-image="jeans2-small.jpg"><img src="jeans2-tiny.jpg" /></a>
<a data-zoom-id="jeans" href="jeans3-big.jpg" data-image="jeans3-small.jpg"><img src="jeans3-tiny.jpg" /></a>
<a data-zoom-id="jeans" href="jeans4-big.jpg" data-image="jeans4-small.jpg"><img src="jeans4-tiny.jpg" /></a>
<a data-zoom-id="jeans" href="jeans5-big.jpg" data-image="jeans5-small.jpg"><img src="jeans5-tiny.jpg" /></a>
我想做的是循环上面的数据并创建一个字典数据结构,我试图创建的一个例子是:
res = { 'results': [
{'consumption': 0.025, 'interval_start': '2021-06-27T00:00:00+01:00', 'interval_end': '2021-06-27T00:30:00+01:00'},
{'consumption': 0.043, 'interval_start': '2021-06-26T23:30:00+01:00', 'interval_end': '2021-06-27T00:00:00+01:00'},
{'consumption': 0.053, 'interval_start': '2021-06-26T23:00:00+01:00', 'interval_end': '2021-06-26T23:30:00+01:00'},
{'consumption': 0.056, 'interval_start': '2021-06-26T22:30:00+01:00', 'interval_end': '2021-06-26T23:00:00+01:00'},
{'consumption': 0.031, 'interval_start': '2021-06-26T22:00:00+01:00', 'interval_end': '2021-06-26T22:30:00+01:00'},
{'consumption': 0.129, 'interval_start': '2021-06-26T21:30:00+01:00', 'interval_end': '2021-06-26T22:00:00+01:00'},
{'consumption': 0.19, 'interval_start': '2021-06-26T21:00:00+01:00', 'interval_end': '2021-06-26T21:30:00+01:00'},
{'consumption': 0.164, 'interval_start': '2021-06-26T20:30:00+01:00', 'interval_end': '2021-06-26T21:00:00+01:00'},
{'consumption': 0.145, 'interval_start': '2021-06-26T20:00:00+01:00', 'interval_end': '2021-06-26T20:30:00+01:00'},
{'consumption': 0.213, 'interval_start': '2021-06-26T19:30:00+01:00', 'interval_end': '2021-06-26T20:00:00+01:00'},
{'consumption': 0.167, 'interval_start': '2021-06-26T19:00:00+01:00', 'interval_end': '2021-06-26T19:30:00+01:00'},
{'consumption': 0.333, 'interval_start': '2021-06-26T18:30:00+01:00', 'interval_end': '2021-06-26T19:00:00+01:00'},
{'consumption': 0.133, 'interval_start': '2021-06-26T18:00:00+01:00', 'interval_end': '2021-06-26T18:30:00+01:00'},
{'consumption': 0.211, 'interval_start': '2021-06-26T17:30:00+01:00', 'interval_end': '2021-06-26T18:00:00+01:00'},
{'consumption': 0.135, 'interval_start': '2021-06-26T17:00:00+01:00', 'interval_end': '2021-06-26T17:30:00+01:00'},
{'consumption': 0.158, 'interval_start': '2021-06-26T16:30:00+01:00', 'interval_end': '2021-06-26T17:00:00+01:00'},
{'consumption': 0.073, 'interval_start': '2021-06-26T16:00:00+01:00', 'interval_end': '2021-06-26T16:30:00+01:00'},
{'consumption': 0.077, 'interval_start': '2021-06-26T15:30:00+01:00', 'interval_end': '2021-06-26T16:00:00+01:00'},
{'consumption': 0.125, 'interval_start': '2021-06-26T15:00:00+01:00', 'interval_end': '2021-06-26T15:30:00+01:00'},
{'consumption': 0.201, 'interval_start': '2021-06-26T14:30:00+01:00', 'interval_end': '2021-06-26T15:00:00+01:00'},
{'consumption': 0.043, 'interval_start': '2021-06-26T14:00:00+01:00', 'interval_end': '2021-06-26T14:30:00+01:00'},
] }
我为此编写的代码是:
{
"2021": {
"06": {
"01": [
{
"interval_start": "23:00",
"interval_end": "23:30",
"consumption": "0.021"
},
{
"interval_start": "22:30",
"interval_end": "23:00",
"consumption": "0.021"
}
],
"02": [
{
"interval_start": "23:00",
"interval_end": "23:30",
"consumption": "0.021"
},
{
"interval_start": "22:30",
"interval_end": "23:00",
"consumption": "0.021"
}
]
}
}
}
其中 main_obj = {}
for i in res['results']:
date = i["interval_start"].split("T")[0].split("-")
insert_obj = {
"interval_start" : i['interval_start'],
"interval_end": i["interval_end"],
"consumption": i["consumption"]
}
main_obj[date[0]] = {}
main_obj[date[0]][date[1]] = {}
main_obj[date[0]][date[1]][date[2]] = []
main_obj[date[0]][date[1]][date[2]].append(insert_obj)
print(main_obj)
是上面的 Dicts 列表。当我打印出来时,我得到:
res['results']
我遇到的问题是为什么当我遍历每个 dict 时,它没有被添加到列表 {
'2021': {
'06': {
'26': [{
'interval_start': '2021-06-26T14:00:00+01:00',
'interval_end': '2021-06-26T14:30:00+01:00',
'consumption': 0.043
}]
}
}
}
中?另外,由于 dicts 具有唯一键,为什么我只看到 26 而不是 27 的插入?哪个在索引 0 处?
任何帮助都会受到赞赏,因为我已经为此挠头一段时间了!
答案 0 :(得分:2)
您正在使用 main_obj[date[0]] = {}
之类的无条件赋值覆盖任何现有的字典/列表;如果已经看到 date[0]
,则您正在擦除之前存在的所有数据。
改用 setdefault
方法。 (我不确定 PEP-8 批准的分线是什么样的。)
(main_obj
.setdefault(date[0], {})
.setdefault(date[1], {})
.setdefault(date[2], [])
).append(insert_obj)
答案 1 :(得分:0)
正如@chepner 所说,这里的问题是,在循环的每次迭代中,如果存在与该键关联的现有值,您将覆盖与某个字典键关联的现有值。
这是一个时髦的解决方案,使用 functools.partial
和 collections.defaultdict
而不是常规字典的 setdefault
方法。
from collections import defaultdict
from functools import partial
from pprint import pprint
results_list = [
{'consumption': 0.025, 'interval_start': '2021-06-27T00:00:00+01:00', 'interval_end': '2021-06-27T00:30:00+01:00'},
{'consumption': 0.043, 'interval_start': '2021-06-26T23:30:00+01:00', 'interval_end': '2021-06-27T00:00:00+01:00'},
{'consumption': 0.053, 'interval_start': '2021-06-26T23:00:00+01:00', 'interval_end': '2021-06-26T23:30:00+01:00'},
{'consumption': 0.056, 'interval_start': '2021-06-26T22:30:00+01:00', 'interval_end': '2021-06-26T23:00:00+01:00'},
{'consumption': 0.031, 'interval_start': '2021-06-26T22:00:00+01:00', 'interval_end': '2021-06-26T22:30:00+01:00'},
{'consumption': 0.129, 'interval_start': '2021-06-26T21:30:00+01:00', 'interval_end': '2021-06-26T22:00:00+01:00'},
{'consumption': 0.19, 'interval_start': '2021-06-26T21:00:00+01:00', 'interval_end': '2021-06-26T21:30:00+01:00'},
{'consumption': 0.164, 'interval_start': '2021-06-26T20:30:00+01:00', 'interval_end': '2021-06-26T21:00:00+01:00'},
{'consumption': 0.145, 'interval_start': '2021-06-26T20:00:00+01:00', 'interval_end': '2021-06-26T20:30:00+01:00'},
{'consumption': 0.213, 'interval_start': '2021-06-26T19:30:00+01:00', 'interval_end': '2021-06-26T20:00:00+01:00'},
{'consumption': 0.167, 'interval_start': '2021-06-26T19:00:00+01:00', 'interval_end': '2021-06-26T19:30:00+01:00'},
{'consumption': 0.333, 'interval_start': '2021-06-26T18:30:00+01:00', 'interval_end': '2021-06-26T19:00:00+01:00'},
{'consumption': 0.133, 'interval_start': '2021-06-26T18:00:00+01:00', 'interval_end': '2021-06-26T18:30:00+01:00'},
{'consumption': 0.211, 'interval_start': '2021-06-26T17:30:00+01:00', 'interval_end': '2021-06-26T18:00:00+01:00'},
{'consumption': 0.135, 'interval_start': '2021-06-26T17:00:00+01:00', 'interval_end': '2021-06-26T17:30:00+01:00'},
{'consumption': 0.158, 'interval_start': '2021-06-26T16:30:00+01:00', 'interval_end': '2021-06-26T17:00:00+01:00'},
{'consumption': 0.073, 'interval_start': '2021-06-26T16:00:00+01:00', 'interval_end': '2021-06-26T16:30:00+01:00'},
{'consumption': 0.077, 'interval_start': '2021-06-26T15:30:00+01:00', 'interval_end': '2021-06-26T16:00:00+01:00'},
{'consumption': 0.125, 'interval_start': '2021-06-26T15:00:00+01:00', 'interval_end': '2021-06-26T15:30:00+01:00'},
{'consumption': 0.201, 'interval_start': '2021-06-26T14:30:00+01:00', 'interval_end': '2021-06-26T15:00:00+01:00'},
{'consumption': 0.043, 'interval_start': '2021-06-26T14:00:00+01:00', 'interval_end': '2021-06-26T14:30:00+01:00'},
]
main_obj = defaultdict(partial(defaultdict, partial(defaultdict, list)))
for i in results_list:
date = i["interval_start"].split("T")[0].split("-")
insert_obj = {
"interval_start" : i['interval_start'],
"interval_end": i["interval_end"],
"consumption": i["consumption"]
}
main_obj[date[0]][date[1]][date[2]].append(insert_obj)
pprint(main_obj) # Expected result (I think!)
defaultdict
的文档是 here,您可以在此处阅读有关其工作原理的更多信息:https://stackoverflow.com/a/5900634/13990016。实例化 defaultdict 时,传递给创建默认值的函数必须采用 0 个参数,因此有必要使用 functools.partial
修改此处的函数,以便它们采用的参数比通常更少。 functools.partial
的文档是 here,您可以在此处阅读有关它如何工作的更多信息:How does functools partial do what it does?。