计算匹配的词典

时间:2016-06-10 07:56:32

标签: python

我有一个包含词典的列表:

[{'x': u'osgb32', 'y': u'osgb4000'},
 {'x': u'osgb4340', 'y': u'osgb4000'},
 {'x': u'osgb4020', 'y': u'osgb4000'},
 {'x': u'osgb32', 'y': u'osgb4000'},
 {'x': u'osgb32', 'y': u'osgb4000'}]

我希望计算每个字典的事件并创建一个新字段count

期望的结果如下:

[{'x': u'osgb32', 'y': u'osgb4000', 'count': 3},
 {'x': u'osgb4340', 'y': u'osgb4000', 'count': 1},
 {'x': u'osgb4020', 'y': u'osgb4000', 'count': 1}]

我不确定如何匹配dict s。

4 个答案:

答案 0 :(得分:3)

您可以使用以下代码轻松实现这一目标

items = [{'x': u'osgb32', 'y': u'osgb4000'},
 {'x': u'osgb4340', 'y': u'osgb4000'},
 {'x': u'osgb4020', 'y': u'osgb4000'},
 {'x': u'osgb32', 'y': u'osgb4000'},
 {'x': u'osgb32', 'y': u'osgb4000'}]

result = {}
counted_items = []
for item in items:
    key = item['x'] + '_' + item['y']
    result[key] = result.get(key, 0) + 1

for key, value in result.iteritems():
    y, x = key.split('_')
    counted_items.append({'x': x, 'y': y, 'count': value})

print counted_items # [{'y': u'osgb32', 'x': u'osgb4000', 'count': 3}, {'y': u'osgb4340', 'x': u'osgb4000', 'count': 1}, {'y': u'osgb4020', 'x': u'osgb4000', 'count': 1}]

另一种选择是使用计数器。有很多关于如何拨打collections.Counter的答案:)

祝你好运!

答案 1 :(得分:3)

这是collections.Counter的工作。但首先你必须将你的dicts转换为实际的元组,因为dicts不可清除,因此不能用作Counter对象中的键:

>>> dicts = [{'x': u'osgb32', 'y': u'osgb4000'},
...          {'x': u'osgb4340', 'y': u'osgb4000'},
...          {'x': u'osgb4020', 'y': u'osgb4000'},
...          {'x': u'osgb32', 'y': u'osgb4000'},
...          {'x': u'osgb32', 'y': u'osgb4000'}]
>>> collections.Counter(tuple(d.items()) for d in dicts)
Counter({(('y', u'osgb4000'), ('x', u'osgb32')): 3, 
         (('y', u'osgb4000'), ('x', u'osgb4020')): 1, 
         (('y', u'osgb4000'), ('x', u'osgb4340')): 1})

然后,您可以使用添加的"count"键将它们转换为dicts:

>>> c = collections.Counter(tuple(d.items()) for d in dicts)
>>> [dict(list(k) + [("count", c[k])]) for k in c]
[{'count': 1, 'x': u'osgb4020', 'y': u'osgb4000'},
 {'count': 3, 'x': u'osgb32', 'y': u'osgb4000'},
 {'count': 1, 'x': u'osgb4340', 'y': u'osgb4000'}]

答案 2 :(得分:3)

您可以使用Counterfrozenset

from collections import Counter

l = [{'x': u'osgb32', 'y': u'osgb4000'},
    {'x': u'osgb4340', 'y': u'osgb4000'},
    {'x': u'osgb4020', 'y': u'osgb4000'},
    {'x': u'osgb32', 'y': u'osgb4000'},
    {'x': u'osgb32', 'y': u'osgb4000'}]

c = Counter(frozenset(d.items()) for d in l)
[dict(k, count=v) for k, v in c.items()] # [{'y': u'osgb4000', 'x': u'osgb4340', 'count': 1}, {'y': u'osgb4000', 'x': u'osgb32', 'count': 3}, {'y': u'osgb4000', 'x': u'osgb4020', 'count': 1}]

答案 3 :(得分:2)

您可以将您的dicts列表作为数据arg传递给DataFrame ctor:

In [74]:
import pandas as pd
data = [{'x': u'osgb32', 'y': u'osgb4000'},
 {'x': u'osgb4340', 'y': u'osgb4000'},
 {'x': u'osgb4020', 'y': u'osgb4000'},
 {'x': u'osgb32', 'y': u'osgb4000'},
 {'x': u'osgb32', 'y': u'osgb4000'}]
df = pd.DataFrame(data)
df

Out[74]:
          x         y
0    osgb32  osgb4000
1  osgb4340  osgb4000
2  osgb4020  osgb4000
3    osgb32  osgb4000
4    osgb32  osgb4000

然后您可以在列号上groubpy并致电size进行统计:

In [76]:    
df.groupby(['x','y']).size()

Out[76]:
x         y       
osgb32    osgb4000    3
osgb4020  osgb4000    1
osgb4340  osgb4000    1
dtype: int64

然后拨打to_dict

In [77]:    
df.groupby(['x','y']).size().to_dict()

Out[77]:
{('osgb32', 'osgb4000'): 3,
 ('osgb4020', 'osgb4000'): 1,
 ('osgb4340', 'osgb4000'): 1}

您可以将上述内容包含在列表中:

In [79]:
[df.groupby(['x','y']).size().to_dict()]

Out[79]:
[{('osgb32', 'osgb4000'): 3,
  ('osgb4020', 'osgb4000'): 1,
  ('osgb4340', 'osgb4000'): 1}]

您可以reset_indexrename列并传递arg orient='records'

In [94]:
df.groupby(['x','y']).size().reset_index().rename(columns={0:'count'}).to_dict(orient='records')

Out[94]:
[{'count': 3, 'x': 'osgb32', 'y': 'osgb4000'},
 {'count': 1, 'x': 'osgb4020', 'y': 'osgb4000'},
 {'count': 1, 'x': 'osgb4340', 'y': 'osgb4000'}]