如何在哈希列表中组合哈希?

时间:2013-06-12 06:52:46

标签: python iterator

我有一个哈希列表,如下所示:

   [{'campaign_id': 'cid2504649263',
  'country': 'AU',
  'impressions': 3000,
  'region': 'Cairns',
  'utcdt': datetime.datetime(2013, 6, 4, 6, 0)},
 {'campaign_id': 'cid2504649263',
  'country': 'AU',
  'count': 9000,
  'region': 'Cairns',
  'utcdt': datetime.datetime(2013, 6, 4, 6, 0)},
 {'campaign_id': 'cid2504649263',
  'country': 'AU',
  'count': 3000,
  'region': 'Cairns',
  'utcdt': datetime.datetime(2013, 6, 4, 7, 0)}]

需要卷起两个哈希,因为所有维度都相同,我需要计算总和。那么......我将如何在itertools中使用python groupby来完成这项任务?还有其他办法吗?

   rolled_up = [{'campaign_id': 'cid2504649263',
  'count': 12000,
  'region': 'Cairns',
  'utcdt': datetime.datetime(2013, 6, 4, 6, 0)},
 {'campaign_id': 'cid2504649263',
  'country': 'AU',
  'count': 3000,
  'region': 'Cairns',
  'utcdt': datetime.datetime(2013, 6, 4, 7, 0)}]

2 个答案:

答案 0 :(得分:2)

如果需要一起滚动的项目是连续的,那么

groupby就可以了。否则你需要先对它们进行排序。我认为collections.Counter对你来说会更好

>>> import datetime
>>> from collections import Counter
>>> C = Counter()
>>> L =     [{'campaign_id': 'cid2504649263',
...   'country': 'AU',
...   'count': 3000,            # <== changed this to "count"
...   'region': 'Cairns',
...   'utcdt': datetime.datetime(2013, 6, 4, 6, 0)},
...  {'campaign_id': 'cid2504649263',
...   'country': 'AU',
...   'count': 3000,
...   'region': 'Cairns',
...   'utcdt': datetime.datetime(2013, 6, 4, 6, 0)},
...  {'campaign_id': 'cid2504649263',
...   'country': 'AU',
...   'count': 3000,
...   'region': 'Cairns',
...   'utcdt': datetime.datetime(2013, 6, 4, 7, 0)}]
>>> for item in L:                        # The ... represents the rest of the key
...     C[item['campaign_id'], item['country'], ...,  item['utcdt']] += item['count']
...
C
Counter({('cid2504649263', 'AU', datetime.datetime(2013, 6, 4, 6, 0)): 6000, ('cid2504649263', 'AU', datetime.datetime(2013, 6, 4, 7, 0)): 3000})

然后将计数器转换回列表格式

答案 1 :(得分:0)

  

有两个哈希需要卷起来,因为所有的哈希   尺寸相同,我需要总计数。

如果这就是你想要的,那么:

from collections import defaultdict

d = defaultdict(int)

for i in hashes:
   d[i['campaign_id'],i['region']] += i['count']

for k in d:
    print k[0],d[k]