我有以下格式的数据:
d = [
{'key': '2018-05-10', 'vals': {'Clicks': 229, 'Link Clicks': 210}},
{'key': '2018-05-11', 'vals': {'Clicks': 365, 'Link Clicks': 379}},
{'key': '2018-05-10', 'vals': {'Clicks': 139, 'Link Clicks': 11}},
{'key': '2018-05-11', 'vals': {'Clicks': 1348, 'Link Clicks': 73}},
]
即,它具有多个具有相同key
我希望它进行分组,以便将Clicks
和Link Clicks
汇总为共同的日期:
所以输出应该像这样:
d = [
{'key': '2018-05-10', 'vals': {'Clicks': 368, 'Link Clicks': 221}},
{'key': '2018-05-11', 'vals': {'Clicks': 1713, 'Link Clicks': 452}},
]
我想到了首先使用defaultdict
将值分组在一起的方法:
from collections import defaultdict
dd = defaultdict(list)
for i in d:
dd[i['key']].append(i['vals'])
给出以下输出:
{ 2018-05-10': [
{'Clicks': 229, 'Link Clicks': 210},
{'Clicks': 139, 'Link Clicks': 11}
],
'2018-05-11': [
{'Clicks': 365, 'Link Clicks': 379},
{'Clicks': 1348, 'Link Clicks': 73}
]}
现在,我想我可以使用Counter
来汇总值,但是我知道它该怎么做。同样,键名,即Clicks
和Link Clicks
可能会更改,并且vals
可以包含2个以上的条目。
还可以不使用defaultdict
来完成吗?有更好的方法吗?
注意:我认为使用这种defaultdict方法并不理想,因为我一直希望按日期对数据进行排序,而一旦我使用dict,我将立即放弃订单
答案 0 :(得分:3)
from pprint import pprint
from collections import Counter, OrderedDict
d = {
'2018-05-10': [
{'Clicks': 229, 'Link Clicks': 210},
{'Clicks': 139, 'Link Clicks': 11}
],
'2018-05-11': [
{'Clicks': 365, 'Link Clicks': 379},
{'Clicks': 1348, 'Link Clicks': 73}
],
}
m = OrderedDict()
for k, v in d.items():
m[k] = Counter()
for i in v:
m[k].update(i)
m[k] = dict(m[k])
# or if you want to keep the 'vals' key and list:
# m[k] = [{"vals": dict(m[k])}]
pprint(m)
输出:
OrderedDict([('2018-05-11', {'Clicks': 1713, 'Link Clicks': 452}),
('2018-05-10', {'Clicks': 368, 'Link Clicks': 221})])
答案 1 :(得分:2)
您可以使用嵌套词典理解。相关的c_type
键,即Clicks
和Link Clicks
,是从每个日期的第一个列表中得出的。否则,该方法自然会接受任意数量的类别。
res = {k: {'vals': {c_type: sum(item[c_type] for item in v) for c_type in v[0]}}
for k, v in dd.items()}
{'2018-05-10': {'vals': {'Clicks': 368, 'Link Clicks': 221}},
'2018-05-11': {'vals': {'Clicks': 1713, 'Link Clicks': 452}}}
答案 2 :(得分:2)
我建议不要将字典的输出格式作为字典的列表,在字典中每个字典都只有键(key
:vals
),您应该只使用实际的{key: vals}
字典对!
这使代码更简洁,更具可读性,并且使访问特定日期变得更加整洁,因为您无需循环浏览列表(O(n)
),您可以直接访问该日期并获得点击次数
例如,
dates = {}
for dd in d:
dates.setdefault(dd['key'], []).append(dd['vals'])
dates = {k: {kk:sum(dd[kk] for dd in v) for kk in v[0].keys()} \
for k,v in dates.items()}
给出:
{
"2018-05-10": {
"Clicks": 368,
"Link Clicks": 221
},
"2018-05-11": {
"Clicks": 1713,
"Link Clicks": 452
}
}
现在,您可以使用以下类似的方法直接获取特定日期的数据:
dates['2018-05-11']['Clicks']
#1713
如果您需要按日期排序的字典列表,那么我们可以使用当前字典并在原始数据中为每个日期建立索引,因为看起来似乎已经被排序了:
order = [dd['key'] for dd in d]
date_list = sorted([{'key':k,'vals':v} for k,v in dates.items()], \
key=lambda dd: order.index(dd['key']))
将date_list
作为按日期排序的列表:
[
{
"key": "2018-05-10",
"vals": {
"Clicks": 368,
"Link Clicks": 221
}
},
{
"key": "2018-05-11",
"vals": {
"Clicks": 1713,
"Link Clicks": 452
}
}
]
答案 3 :(得分:1)
我们可以将其概括为基本的“组折叠”方法:
from operator import add, itemgetter
def group_fold(data, fold=add, key=itemgetter('key'), vals=itemgetter('vals')):
result = {}
for entry in data:
ky = key(entry)
vlb = vals(entry)
vla = result.get(ky, None)
if vla:
for subk, subv in vl.items():
if subk in vla:
vla[subk] = fold(vla[subk], subv)
else:
vla[subk] = subv
else:
result[ky] = dict(vlb)
return result
因此,我们现在可以将其用作group_fold(d)
,但是我们可以自定义折叠功能,例如,将折叠功能自定义为mul
,而不是add
:
from operator import mul
group_fold(d, fold=mul)
答案 4 :(得分:1)
from collections import defaultdict, Counter, OrderedDict
ld = [{'key': '2018-05-10', 'vals': {'Clicks': 229, 'Link Clicks': 210}}, {'key': '2018-05-11', 'vals': {'Clicks': 365, 'Link Clicks': 379}}, {'key': '2018-05-10', 'vals': {'Clicks': 139, 'Link Clicks': 11}}, {'key': '2018-05-11', 'vals': {'Clicks': 1348, 'Link Clicks': 73}}]
out=defaultdict(Counter())
for d in ld:
out[d['key']].update(d['vals'])
new = OrderedDict(sorted(out.items()))
print(new)
# OrderedDict([('2018-05-10', Counter({'Clicks': 368, 'Link Clicks': 221})), ('2018-05-11', Counter({'Clicks': 1713, 'Link Clicks': 452}))])
答案 5 :(得分:1)
尝试此解决方案:
d = [
{'key': '2018-05-10', 'vals': {'Clicks': 229, 'Link Clicks': 210}},
{'key': '2018-06-01', 'vals': {'Clicks': 365, 'Link Clicks': 379}},
{'key': '2018-05-10', 'vals': {'Clicks': 139, 'Link Clicks': 11}},
{'key': '2018-06-01', 'vals': {'Clicks': 1348, 'Link Clicks': 73}},
]
final_dict = {}
for doc in d:
date = doc['key']
if date not in final_dict:
final_dict[date] = {}
for key in doc['vals']:
final_dict[date][key] = doc['vals'][key]
else:
for key in doc['vals']:
final_dict[date][key] += doc['vals'][key]
resp_dict = [{date: final_dict[date]} for date in sorted(final_dict)]
print resp_dict
答案 6 :(得分:0)
使用嵌套的defaultdict:
result = defaultdict(lambda: defaultdict(int))
for entry in d:
for key, val in entry['vals'].items():
result[entry['key']][key] += val
它将为您提供以下结果:
{"2018-05-10": {"Clicks": 368, "Link Clicks": 221}, "2018-05-11": {"Clicks": 1713, "Link Clicks": 452}}
答案 7 :(得分:0)
d = [
{'key': '2018-05-10', 'vals': {'Clicks': 368, 'Link Clicks': 221}},
{'key': '2018-05-11', 'vals': {'Clicks': 1713, 'Link Clicks': 452}},
]
from itertools import groupby
from operator import itemgetter
newdict={}
for dt, k in groupby(sorted(d,key=itemgetter('key')),key=itemgetter('key')):
for d in k:
newdict[dt]=d['vals']
输出:
{'2018-05-10': {'Clicks': 368, 'Link Clicks': 221},
'2018-05-11': {'Clicks': 1713, 'Link Clicks': 452}}