Python Dictionary Trouble:按元组键中的元素分组

时间:2015-07-01 04:02:19

标签: python dictionary

所以我有一个看起来像这样的字典,有4个元素元组作为键,列表列表作为对应的值。 (yay索引)

{('A002', 'R051', '02-00-00', 'LEXINGTON AVE'): [[datetime.datetime(2015, 6, 20, 0, 0),
                                                  750],
                                                 [datetime.datetime(2015, 6, 21, 0, 0),
                                                  576],
                                                 [datetime.datetime(2015, 6, 22, 0, 0),
                                                  1486],
                                                 [datetime.datetime(2015, 6, 23, 0, 0),
                                                  595],
                                                 [datetime.datetime(2015, 6, 24, 0, 0),
                                                  841],
                                                 [datetime.datetime(2015, 6, 25, 0, 0),
                                                  1072],
                                                 [datetime.datetime(2015, 6, 26, 0, 0),
                                                  1049]],
 ('A002', 'R051', '02-00-01', 'LEXINGTON AVE'): [[datetime.datetime(2015, 6, 20, 0, 0),
                                                  670],
                                                 [datetime.datetime(2015, 6, 21, 0, 0),
                                                  457],
                                                 [datetime.datetime(2015, 6, 22, 0, 0),
                                                  1189],
                                                 [datetime.datetime(2015, 6, 23, 0, 0),
                                                  505],
                                                 [datetime.datetime(2015, 6, 24, 0, 0),
                                                  665],
                                                 [datetime.datetime(2015, 6, 25, 0, 0),
                                                  354],
                                                 [datetime.datetime(2015, 6, 26, 0, 0),
                                                  651]]}

我想修改这个字典,以便为具有相同的第1,第2和第4元组元素的所有键组合值。 (因为那里的两把钥匙)。我想将这两个关键元组组合成一个关键元组(以便我的组合键只有('A002', 'R051', 'LEXINGTON AVE'))并组合这些值。这在python中是否可行?

因此,例如,第一个值是[datetime.datetime(2015,6,20,0,0),1420] -----这是670 + 750,在这种情况下

提前致谢。

3 个答案:

答案 0 :(得分:3)

是的,请继续制作另一本字典。假设您上面的数据存储在data中,我们会制作一个名为short_data的词典:

short_data = {}
for key, value in data.items():
    short_key = (key[0], key[1], key[3])
    if short_key in short_data:
        short_data[short_key].extend(value)
    else:
        short_data[short_key] = value

或者,如果你不介意使用defaultdict,你可以缩短它:

import collections

short_data = collections.defaultdict(list)
for key, value in data.items():
    short_key = (key[0], key[1], key[3])
    short_data[short_key].extend(value)

如果您想通过添加值来合并这些值,我建议使用Counter

import collections
short_data = collections.defaultdict(collections.Counter)
for key, value in data.items():
    short_key = (key[0], key[1], key[3])
    short_data[short_key] += collections.Counter(dict(data[key]))

答案 1 :(得分:2)

是的,非常有可能,从Python 2.7开始使用groupbydictionary comprehension

示例代码 -

>>> from itertools import groupby
>>> import datetime
>>> d = {('A002', 'R051', '02-00-00', 'LEXINGTON AVE'): [[datetime.datetime(2015, 6, 20, 0, 0),
...                                                   750],
...                                                  [datetime.datetime(2015, 6, 21, 0, 0),
...                                                   576],
...                                                  [datetime.datetime(2015, 6, 22, 0, 0),
...                                                   1486],
...                                                  [datetime.datetime(2015, 6, 23, 0, 0),
...                                                   595],
...                                                  [datetime.datetime(2015, 6, 24, 0, 0),
...                                                   841],
...                                                  [datetime.datetime(2015, 6, 25, 0, 0),
...                                                   1072],
...                                                  [datetime.datetime(2015, 6, 26, 0, 0),
...                                                   1049]],
...  ('A002', 'R051', '02-00-01', 'LEXINGTON AVE'): [[datetime.datetime(2015, 6, 20, 0, 0),
...                                                   670],
...                                                  [datetime.datetime(2015, 6, 21, 0, 0),
...                                                   457],
...                                                  [datetime.datetime(2015, 6, 22, 0, 0),
...                                                   1189],
...                                                  [datetime.datetime(2015, 6, 23, 0, 0),
...                                                   505],
...                                                  [datetime.datetime(2015, 6, 24, 0, 0),
...                                                   665],
...                                                  [datetime.datetime(2015, 6, 25, 0, 0),
...                                                   354],
...                                                  [datetime.datetime(2015, 6, 26, 0, 0),
...                                                   651]]}
>>>
>>> newd = {(x[0],x[1],x[2]):[z for a in y for z in a[1]] for x, y in groupby(d.items(),key= lambda x: (x[0][0],x[0][1],x[0][3]))}
>>> newd
{('A002', 'R051', 'LEXINGTON AVE'): [[datetime.datetime(2015, 6, 20, 0, 0), 750], [datetime.datetime(2015, 6, 21, 0, 0), 576], [datetime.datetime(2015, 6, 22, 0, 0), 1486], [datetime.datetime(2015, 6, 23, 0, 0), 595], [datetime.datetime(2015, 6, 24, 0, 0), 841], [datetime.datetime(2015, 6, 25, 0, 0), 1072], [datetime.datetime(2015, 6, 26, 0, 0), 1049], [datetime.datetime(2015, 6, 20, 0, 0), 670],
[datetime.datetime(2015, 6, 21, 0, 0), 457], [datetime.datetime(2015, 6, 22, 0, 0), 1189], [datetime.datetime(2015, 6, 23, 0, 0), 505], [datetime.datetime(2015, 6, 24, 0, 0), 665], [datetime.datetime(2015, 6, 25, 0, 0), 354], [datetime.datetime(2015, 6, 26, 0, 0), 651]]}

答案 2 :(得分:1)

我在你的词典中添加了一个额外的密钥,只是为了让解决方案更加清晰。这是我的意见。

t = {('A002', 'R051', '02-00-00', 'LEXINGTON AVE'): [[datetime.datetime(2015, 6, 20, 0, 0),
                                                      750],
                                                     [datetime.datetime(2015, 6, 21, 0, 0),
                                                      576],
                                                     [datetime.datetime(2015, 6, 22, 0, 0),
                                                      1486],
                                                     [datetime.datetime(2015, 6, 23, 0, 0),
                                                      595],
                                                     [datetime.datetime(2015, 6, 24, 0, 0),
                                                      841],
                                                     [datetime.datetime(2015, 6, 25, 0, 0),
                                                      1072],
                                                     [datetime.datetime(2015, 6, 26, 0, 0),
                                                      1049]],
     ('A002', 'R051', '02-00-01', 'LEXINGTON AVE'): [[datetime.datetime(2015, 6, 20, 0, 0),
                                                      670],
                                                     [datetime.datetime(2015, 6, 21, 0, 0),
                                                      457],
                                                     [datetime.datetime(2015, 6, 22, 0, 0),
                                                      1189],
                                                     [datetime.datetime(2015, 6, 23, 0, 0),
                                                      505],
                                                     [datetime.datetime(2015, 6, 24, 0, 0),
                                                      665],
                                                     [datetime.datetime(2015, 6, 25, 0, 0),
                                                      354],
                                                     [datetime.datetime(2015, 6, 26, 0, 0),
                                                      651]],
     ('A002', 'R051', '02-00-01', 'LEXINGTON LANE'): [[datetime.datetime(2015, 6, 20, 0, 0),
                                                      670],
                                                     [datetime.datetime(2015, 6, 21, 0, 0),
                                                      457],
                                                     [datetime.datetime(2015, 6, 22, 0, 0),
                                                      1189],
                                                     [datetime.datetime(2015, 6, 23, 0, 0),
                                                      505],
                                                     [datetime.datetime(2015, 6, 24, 0, 0),
                                                      665],
                                                     [datetime.datetime(2015, 6, 25, 0, 0),
                                                      354],
                                                     [datetime.datetime(2015, 6, 26, 0, 0),
                                                      651]]}

现在,你可以这样做。

import itertools
groups = itertools.groupby(sorted(t), lambda x:(x[0], x[1], x[3])

这将对字典的键进行排序并返回对列表。每对中的第一项将是新的唯一键(3元组),第二项将是一个迭代器,它为您提供适合此“组”的所有原始键。现在你可以像这样“压缩”字典

compressed = {k1:sum((t[k2] for k2 in v),[])
          for k1,v in groups}

这基本上从组列表中获取每对。对于每一对,它使用第一个元素作为键(k1),并使用sumt中具有映射到k1的键的所有条目组合到一个列表中。这就是t[k2] for k2 in vsum只是将所有这些组合成一个列表。

结果如下。

{('A002', 'R051', 'LEXINGTON AVE'): [[datetime.datetime(2015, 6, 20, 0, 0),
                                      750],
                                     [datetime.datetime(2015, 6, 21, 0, 0),
                                      576],
                                     [datetime.datetime(2015, 6, 22, 0, 0),
                                      1486],
                                     [datetime.datetime(2015, 6, 23, 0, 0),
                                      595],
                                     [datetime.datetime(2015, 6, 24, 0, 0),
                                      841],
                                     [datetime.datetime(2015, 6, 25, 0, 0),
                                      1072],
                                     [datetime.datetime(2015, 6, 26, 0, 0),
                                      1049],
                                     [datetime.datetime(2015, 6, 20, 0, 0),
                                      670],
                                     [datetime.datetime(2015, 6, 21, 0, 0),
                                      457],
                                     [datetime.datetime(2015, 6, 22, 0, 0),
                                      1189],
                                     [datetime.datetime(2015, 6, 23, 0, 0),
                                      505],
                                     [datetime.datetime(2015, 6, 24, 0, 0),
                                      665],
                                     [datetime.datetime(2015, 6, 25, 0, 0),
                                      354],
                                     [datetime.datetime(2015, 6, 26, 0, 0),
                                      651]],
 ('A002', 'R051', 'LEXINGTON LANE'): [[datetime.datetime(2015, 6, 20, 0, 0),
                                       670],
                                      [datetime.datetime(2015, 6, 21, 0, 0),
                                       457],
                                      [datetime.datetime(2015, 6, 22, 0, 0),
                                       1189],
                                      [datetime.datetime(2015, 6, 23, 0, 0),
                                       505],
                                      [datetime.datetime(2015, 6, 24, 0, 0),
                                       665],
                                      [datetime.datetime(2015, 6, 25, 0, 0),
                                       354],
                                      [datetime.datetime(2015, 6, 26, 0, 0),
                                       651]]}

现在,我们需要使用日期来组合值。我们可以像这样编写一个简单的函数combine

def combine(l):
    t = itertools.groupby(sorted(l, key=lambda v:v[0]), lambda v:v[0])
    return [[k,sum(m[1] for m in v)] for k,v in t]

这在2个元组的列表上重复上述过程。它按第一个元素分组,然后将子组的第二个元素合并为一个列表。

最后,要获得我们的最终列表,您只需将combine映射到我们compressed词典的所有值

final = {k:combine(v) for k,v in compressed.iteritems()}

结果如下

pprint.pprint(final)

{('A002', 'R051', 'LEXINGTON AVE'): [[datetime.datetime(2015, 6, 20, 0, 0),
                                      1420],
                                     [datetime.datetime(2015, 6, 21, 0, 0),
                                      1033],
                                     [datetime.datetime(2015, 6, 22, 0, 0),
                                      2675],
                                     [datetime.datetime(2015, 6, 23, 0, 0),
                                      1100],
                                     [datetime.datetime(2015, 6, 24, 0, 0),
                                      1506],
                                     [datetime.datetime(2015, 6, 25, 0, 0),
                                      1426],
                                     [datetime.datetime(2015, 6, 26, 0, 0),
                                      1700]],
 ('A002', 'R051', 'LEXINGTON LANE'): [[datetime.datetime(2015, 6, 20, 0, 0),
                                       670],
                                      [datetime.datetime(2015, 6, 21, 0, 0),
                                       457],
                                      [datetime.datetime(2015, 6, 22, 0, 0),
                                       1189],
                                      [datetime.datetime(2015, 6, 23, 0, 0),
                                       505],
                                      [datetime.datetime(2015, 6, 24, 0, 0),
                                       665],
                                      [datetime.datetime(2015, 6, 25, 0, 0),
                                       354],
                                      [datetime.datetime(2015, 6, 26, 0, 0),
                                       651]]}

就像我喜欢简洁,非平凡的表达通常会逃脱我有限大脑的限制。我经常将这些内容分解为多个这样的表达式,以便更容易阅读,理解和调试。

所以,最后,您可以使用以下代码完成整个过程。

def combine(l):
    t = itertools.groupby(sorted(l, key=lambda v:v[0]), lambda v:v[0])
    return [[k,sum(m[1] for m in v)] for k,v in t]


groups = itertools.groupby(sorted(t), lambda x:(x[0], x[1], x[3]))
compressed = {k1:sum((t[k2] for k2 in v), [])
              for k1,v in groups}
final = {k:combine(v) for k,v in compressed.iteritems()}

从效率的角度来看,我不喜欢这个解决方案。它遍历键,然后再次重复多次。也许你可以在更合适的数据结构中维护各种元素。例如日期时间对象和值的列表可以是collections.Counter,其中键作为日期时间,值作为数字。