所以我有一个看起来像这样的字典,有4个元素元组作为键,列表列表作为对应的值。 (yay索引)
{('A002', 'R051', '02-00-00', 'LEXINGTON AVE'): [[datetime.datetime(2015, 6, 20, 0, 0),
750],
[datetime.datetime(2015, 6, 21, 0, 0),
576],
[datetime.datetime(2015, 6, 22, 0, 0),
1486],
[datetime.datetime(2015, 6, 23, 0, 0),
595],
[datetime.datetime(2015, 6, 24, 0, 0),
841],
[datetime.datetime(2015, 6, 25, 0, 0),
1072],
[datetime.datetime(2015, 6, 26, 0, 0),
1049]],
('A002', 'R051', '02-00-01', 'LEXINGTON AVE'): [[datetime.datetime(2015, 6, 20, 0, 0),
670],
[datetime.datetime(2015, 6, 21, 0, 0),
457],
[datetime.datetime(2015, 6, 22, 0, 0),
1189],
[datetime.datetime(2015, 6, 23, 0, 0),
505],
[datetime.datetime(2015, 6, 24, 0, 0),
665],
[datetime.datetime(2015, 6, 25, 0, 0),
354],
[datetime.datetime(2015, 6, 26, 0, 0),
651]]}
我想修改这个字典,以便为具有相同的第1,第2和第4元组元素的所有键组合值。 (因为那里的两把钥匙)。我想将这两个关键元组组合成一个关键元组(以便我的组合键只有('A002', 'R051', 'LEXINGTON AVE')
)并组合这些值。这在python中是否可行?
因此,例如,第一个值是[datetime.datetime(2015,6,20,0,0),1420] -----这是670 + 750,在这种情况下
提前致谢。
答案 0 :(得分:3)
是的,请继续制作另一本字典。假设您上面的数据存储在data
中,我们会制作一个名为short_data
的词典:
short_data = {}
for key, value in data.items():
short_key = (key[0], key[1], key[3])
if short_key in short_data:
short_data[short_key].extend(value)
else:
short_data[short_key] = value
或者,如果你不介意使用defaultdict
,你可以缩短它:
import collections
short_data = collections.defaultdict(list)
for key, value in data.items():
short_key = (key[0], key[1], key[3])
short_data[short_key].extend(value)
如果您想通过添加值来合并这些值,我建议使用Counter
:
import collections
short_data = collections.defaultdict(collections.Counter)
for key, value in data.items():
short_key = (key[0], key[1], key[3])
short_data[short_key] += collections.Counter(dict(data[key]))
答案 1 :(得分:2)
是的,非常有可能,从Python 2.7开始使用groupby
和dictionary comprehension
。
示例代码 -
>>> from itertools import groupby
>>> import datetime
>>> d = {('A002', 'R051', '02-00-00', 'LEXINGTON AVE'): [[datetime.datetime(2015, 6, 20, 0, 0),
... 750],
... [datetime.datetime(2015, 6, 21, 0, 0),
... 576],
... [datetime.datetime(2015, 6, 22, 0, 0),
... 1486],
... [datetime.datetime(2015, 6, 23, 0, 0),
... 595],
... [datetime.datetime(2015, 6, 24, 0, 0),
... 841],
... [datetime.datetime(2015, 6, 25, 0, 0),
... 1072],
... [datetime.datetime(2015, 6, 26, 0, 0),
... 1049]],
... ('A002', 'R051', '02-00-01', 'LEXINGTON AVE'): [[datetime.datetime(2015, 6, 20, 0, 0),
... 670],
... [datetime.datetime(2015, 6, 21, 0, 0),
... 457],
... [datetime.datetime(2015, 6, 22, 0, 0),
... 1189],
... [datetime.datetime(2015, 6, 23, 0, 0),
... 505],
... [datetime.datetime(2015, 6, 24, 0, 0),
... 665],
... [datetime.datetime(2015, 6, 25, 0, 0),
... 354],
... [datetime.datetime(2015, 6, 26, 0, 0),
... 651]]}
>>>
>>> newd = {(x[0],x[1],x[2]):[z for a in y for z in a[1]] for x, y in groupby(d.items(),key= lambda x: (x[0][0],x[0][1],x[0][3]))}
>>> newd
{('A002', 'R051', 'LEXINGTON AVE'): [[datetime.datetime(2015, 6, 20, 0, 0), 750], [datetime.datetime(2015, 6, 21, 0, 0), 576], [datetime.datetime(2015, 6, 22, 0, 0), 1486], [datetime.datetime(2015, 6, 23, 0, 0), 595], [datetime.datetime(2015, 6, 24, 0, 0), 841], [datetime.datetime(2015, 6, 25, 0, 0), 1072], [datetime.datetime(2015, 6, 26, 0, 0), 1049], [datetime.datetime(2015, 6, 20, 0, 0), 670],
[datetime.datetime(2015, 6, 21, 0, 0), 457], [datetime.datetime(2015, 6, 22, 0, 0), 1189], [datetime.datetime(2015, 6, 23, 0, 0), 505], [datetime.datetime(2015, 6, 24, 0, 0), 665], [datetime.datetime(2015, 6, 25, 0, 0), 354], [datetime.datetime(2015, 6, 26, 0, 0), 651]]}
答案 2 :(得分:1)
我在你的词典中添加了一个额外的密钥,只是为了让解决方案更加清晰。这是我的意见。
t = {('A002', 'R051', '02-00-00', 'LEXINGTON AVE'): [[datetime.datetime(2015, 6, 20, 0, 0),
750],
[datetime.datetime(2015, 6, 21, 0, 0),
576],
[datetime.datetime(2015, 6, 22, 0, 0),
1486],
[datetime.datetime(2015, 6, 23, 0, 0),
595],
[datetime.datetime(2015, 6, 24, 0, 0),
841],
[datetime.datetime(2015, 6, 25, 0, 0),
1072],
[datetime.datetime(2015, 6, 26, 0, 0),
1049]],
('A002', 'R051', '02-00-01', 'LEXINGTON AVE'): [[datetime.datetime(2015, 6, 20, 0, 0),
670],
[datetime.datetime(2015, 6, 21, 0, 0),
457],
[datetime.datetime(2015, 6, 22, 0, 0),
1189],
[datetime.datetime(2015, 6, 23, 0, 0),
505],
[datetime.datetime(2015, 6, 24, 0, 0),
665],
[datetime.datetime(2015, 6, 25, 0, 0),
354],
[datetime.datetime(2015, 6, 26, 0, 0),
651]],
('A002', 'R051', '02-00-01', 'LEXINGTON LANE'): [[datetime.datetime(2015, 6, 20, 0, 0),
670],
[datetime.datetime(2015, 6, 21, 0, 0),
457],
[datetime.datetime(2015, 6, 22, 0, 0),
1189],
[datetime.datetime(2015, 6, 23, 0, 0),
505],
[datetime.datetime(2015, 6, 24, 0, 0),
665],
[datetime.datetime(2015, 6, 25, 0, 0),
354],
[datetime.datetime(2015, 6, 26, 0, 0),
651]]}
现在,你可以这样做。
import itertools
groups = itertools.groupby(sorted(t), lambda x:(x[0], x[1], x[3])
这将对字典的键进行排序并返回对列表。每对中的第一项将是新的唯一键(3元组),第二项将是一个迭代器,它为您提供适合此“组”的所有原始键。现在你可以像这样“压缩”字典
compressed = {k1:sum((t[k2] for k2 in v),[])
for k1,v in groups}
这基本上从组列表中获取每对。对于每一对,它使用第一个元素作为键(k1),并使用sum
将t
中具有映射到k1
的键的所有条目组合到一个列表中。这就是t[k2] for k2 in v
。 sum
只是将所有这些组合成一个列表。
结果如下。
{('A002', 'R051', 'LEXINGTON AVE'): [[datetime.datetime(2015, 6, 20, 0, 0),
750],
[datetime.datetime(2015, 6, 21, 0, 0),
576],
[datetime.datetime(2015, 6, 22, 0, 0),
1486],
[datetime.datetime(2015, 6, 23, 0, 0),
595],
[datetime.datetime(2015, 6, 24, 0, 0),
841],
[datetime.datetime(2015, 6, 25, 0, 0),
1072],
[datetime.datetime(2015, 6, 26, 0, 0),
1049],
[datetime.datetime(2015, 6, 20, 0, 0),
670],
[datetime.datetime(2015, 6, 21, 0, 0),
457],
[datetime.datetime(2015, 6, 22, 0, 0),
1189],
[datetime.datetime(2015, 6, 23, 0, 0),
505],
[datetime.datetime(2015, 6, 24, 0, 0),
665],
[datetime.datetime(2015, 6, 25, 0, 0),
354],
[datetime.datetime(2015, 6, 26, 0, 0),
651]],
('A002', 'R051', 'LEXINGTON LANE'): [[datetime.datetime(2015, 6, 20, 0, 0),
670],
[datetime.datetime(2015, 6, 21, 0, 0),
457],
[datetime.datetime(2015, 6, 22, 0, 0),
1189],
[datetime.datetime(2015, 6, 23, 0, 0),
505],
[datetime.datetime(2015, 6, 24, 0, 0),
665],
[datetime.datetime(2015, 6, 25, 0, 0),
354],
[datetime.datetime(2015, 6, 26, 0, 0),
651]]}
现在,我们需要使用日期来组合值。我们可以像这样编写一个简单的函数combine
def combine(l):
t = itertools.groupby(sorted(l, key=lambda v:v[0]), lambda v:v[0])
return [[k,sum(m[1] for m in v)] for k,v in t]
这在2个元组的列表上重复上述过程。它按第一个元素分组,然后将子组的第二个元素合并为一个列表。
最后,要获得我们的最终列表,您只需将combine
映射到我们compressed
词典的所有值
final = {k:combine(v) for k,v in compressed.iteritems()}
结果如下
pprint.pprint(final)
{('A002', 'R051', 'LEXINGTON AVE'): [[datetime.datetime(2015, 6, 20, 0, 0),
1420],
[datetime.datetime(2015, 6, 21, 0, 0),
1033],
[datetime.datetime(2015, 6, 22, 0, 0),
2675],
[datetime.datetime(2015, 6, 23, 0, 0),
1100],
[datetime.datetime(2015, 6, 24, 0, 0),
1506],
[datetime.datetime(2015, 6, 25, 0, 0),
1426],
[datetime.datetime(2015, 6, 26, 0, 0),
1700]],
('A002', 'R051', 'LEXINGTON LANE'): [[datetime.datetime(2015, 6, 20, 0, 0),
670],
[datetime.datetime(2015, 6, 21, 0, 0),
457],
[datetime.datetime(2015, 6, 22, 0, 0),
1189],
[datetime.datetime(2015, 6, 23, 0, 0),
505],
[datetime.datetime(2015, 6, 24, 0, 0),
665],
[datetime.datetime(2015, 6, 25, 0, 0),
354],
[datetime.datetime(2015, 6, 26, 0, 0),
651]]}
就像我喜欢简洁,非平凡的表达通常会逃脱我有限大脑的限制。我经常将这些内容分解为多个这样的表达式,以便更容易阅读,理解和调试。
所以,最后,您可以使用以下代码完成整个过程。
def combine(l):
t = itertools.groupby(sorted(l, key=lambda v:v[0]), lambda v:v[0])
return [[k,sum(m[1] for m in v)] for k,v in t]
groups = itertools.groupby(sorted(t), lambda x:(x[0], x[1], x[3]))
compressed = {k1:sum((t[k2] for k2 in v), [])
for k1,v in groups}
final = {k:combine(v) for k,v in compressed.iteritems()}
从效率的角度来看,我不喜欢这个解决方案。它遍历键,然后再次重复多次。也许你可以在更合适的数据结构中维护各种元素。例如日期时间对象和值的列表可以是collections.Counter
,其中键作为日期时间,值作为数字。