根据公共密钥值汇总字典列表

时间:2014-06-02 22:10:13

标签: python list dictionary set

我有一个像这样的词典列表:

dictlist = [{'day': 0, 'start': '8:00am', 'end': '5:00pm'},
            {'day': 1, 'start': '10:00am', 'end': '7:00pm'},
            {'day': 2, 'start': '8:00am', 'end': '5:00pm'},
            {'day': 3, 'start': '10:00am', 'end': '7:00pm'},
            {'day': 4, 'start': '8:00am', 'end': '5:00pm'},
            {'day': 5, 'start': '11:00am', 'end': '1:00pm'}]

我想总结一下共享'start''end'次的日子。

例如,

summarylist = [([0,2, 4], '8:00am', '5:00pm'),
               ([1, 3], '10:00am', '7:00pm')
               ([5], '11:00am', '1:00pm')]

我已经尝试调整其他一些StackOverflow解决方案:设置和交叉点来实现这一点而没有运气。我试图重新使用the solution to this question无济于事。希望有人能指出我正确的方向。

3 个答案:

答案 0 :(得分:2)

使用itertools.groupby

In [1]: %paste
dictlist = [{'day': 0, 'start': '8:00am',  'end': '5:00pm'},
            {'day': 1, 'start': '10:00am', 'end': '7:00pm'},
            {'day': 2, 'start': '8:00am',  'end': '5:00pm'},
            {'day': 3, 'start': '10:00am', 'end': '7:00pm'},
            {'day': 4, 'start': '8:00am',  'end': '5:00pm'},
            {'day': 5, 'start': '11:00am', 'end': '1:00pm'}]

## -- End pasted text --

In [2]: from itertools import groupby

In [3]: tuplist = [(d['day'], (d['start'], d['end'])) for d in dictlist]

In [4]: key = lambda x: x[1]

In [5]: summarylist = [(sorted(e[0] for e in g),) + k
   ...:        for k, g in groupby(sorted(tuplist, key=key), key=key)]

In [6]: summarylist
Out[6]:
[([1, 3], '10:00am', '7:00pm'),
 ([5], '11:00am', '1:00pm'),
 ([0, 2, 4], '8:00am', '5:00pm')]

答案 1 :(得分:2)

如果您不需要提供的确切格式,可以使用defaultdict

dictlist = [{'day': 0, 'start': '8:00am', 'end': '5:00pm'},
            {'day': 1, 'start': '10:00am', 'end': '7:00pm'},
            {'day': 2, 'start': '8:00am', 'end': '5:00pm'},
            {'day': 3, 'start': '10:00am', 'end': '7:00pm'},
            {'day': 4, 'start': '8:00am', 'end': '5:00pm'},
            {'day': 5, 'start': '11:00am', 'end': '1:00pm'}]

from collections import defaultdict

dd = defaultdict(list)

for d in dictlist:
    dd[(d['start'],d['end'])].append(d['day'])

结果:

>>> dd
defaultdict(<type 'list'>, {('11:00am', '1:00pm'): [5], ('10:00am', '7:00pm'): [1, 3], ('8:00am', '5:00pm'): [0, 2, 4]})

如果格式很重要,你可以这样做:

>>> my_list = [(v, k[0], k[1]) for k,v in dd.iteritems()]
>>> my_list
[([5], '11:00am', '1:00pm'), ([1, 3], '10:00am', '7:00pm'), ([0, 2, 4], '8:00am', '5:00pm')]
>>> # If you need the output sorted:  
>>> sorted_my_list = sorted(my_list, key = lambda k : len(k[0]), reverse=True)
>>> sorted_my_list
[([0, 2, 4], '8:00am', '5:00pm'), ([1, 3], '10:00am', '7:00pm'), ([5], '11:00am', '1:00pm')]

答案 2 :(得分:0)

您可以像这样使用itertools.groupby

源代码:

from itertools import groupby
for k, grp in groupby(sorted(dictlist, key=lambda x:(x['end'], x['start'])), key=lambda x:(x['start'], x['end'])):
    print [i['day'] for i in grp], k

输出:

[5] ('11:00am', '1:00pm')
[0, 2, 4] ('8:00am', '5:00pm')
[1, 3] ('10:00am', '7:00pm')

但我认为在这种特殊情况下使用defaultdict(@ Akavall回答)是正确的方法。