在Python中的Ordereddict中聚合和计算

时间:2018-03-16 16:42:00

标签: python dictionary itertools ordereddictionary

有一个Ordereddict" d"看起来像那样:

[OrderedDict([
              ('id', '1'),
              ('date', '20170101'),
              ('quantity', '10')]),
 OrderedDict([
              ('id', '2'),
              ('date', '20170102'),
              ('quantity', '3')]),
 OrderedDict([
              ('id', '3'),
              ('date', '20170102'),
              ('quantity', '1')])]

我试图通过' date'并计算数量之和并显示这两列' date'和' sum_quantity'。我怎么能不使用pandas groupby选项呢?

谢谢!

3 个答案:

答案 0 :(得分:4)

  

我正在尝试按'日期'进行分组并计算数量之和并显示这两列'date'和'sum_quantity'

此代码将日期作为键,然后值是数量的总和。在您显示所需输出的示例之前,输出有点猜测。

In[2]: from collections import OrderedDict, defaultdict
  ...: 
  ...: 
  ...: def solution(data):
  ...:     result = defaultdict(int)
  ...:     for od in data:
  ...:         result[od['date']] += int(od['quantity'])
  ...:     return result
  ...: 
In[3]: data = [
  ...:     OrderedDict([
  ...:         ('id', '1'),
  ...:         ('date', '20170101'),
  ...:         ('quantity', '10')]),
  ...:     OrderedDict([
  ...:         ('id', '2'),
  ...:         ('date', '20170102'),
  ...:         ('quantity', '3')]),
  ...:     OrderedDict([
  ...:         ('id', '3'),
  ...:         ('date', '20170102'),
  ...:         ('quantity', '1')])
  ...: ]
In[4]: grouped = solution(data)
In[5]: grouped
Out[5]: defaultdict(int, {'20170101': 10, '20170102': 4})
In[6]: print('{:>8}\tSum Quantity'.format('Date'))
  ...: for k, v in grouped.items():
  ...:     print('{}\t{:>12}'.format(k, v))
  ...: 
    Date    Sum Quantity
20170101              10
20170102               4

答案 1 :(得分:0)

这是纯python方法,这只是一个给你提示的例子。如果你想在纯python中使用它,你可以使用它。

from collections import OrderedDict
import itertools
data=[OrderedDict([
              ('id', '1'),
              ('date', '20170101'),
              ('quantity', '10')]),
 OrderedDict([
              ('id', '2'),
              ('date', '20170102'),
              ('quantity', '3')]),
 OrderedDict([
              ('id', '3'),
              ('date', '20170102'),
              ('quantity', '1')])]



def get_quantity(ord_dict):
    new_ = []
    for g in [list(i) for j, i in itertools.groupby(ord_dict, lambda x: x['date'])]:
        if len(g) > 1:
            sub_dict={}
            temp = []
            date = []
            for i in g:
                temp.append(int(i['quantity']))
                date.append(i['date'])
            sub_dict['date'] = date[0]
            sub_dict['sum_quantity'] = sum(temp)
            new_.append(sub_dict)


        else:
            for i in g:
                sub_dict={}
                sub_dict['date']=i['date']
                sub_dict['sum_quantity']=i['quantity']
                new_.append(sub_dict)

    return new_
print(get_quantity(data))

输出:

[{'date': '20170101', 'sum_quantity': '10'}, {'date': '20170102', 'sum_quantity': 4}]

答案 2 :(得分:0)

<强>鉴于

from collections import OrderedDict, defaultdict


lst = [
    OrderedDict([
              ("id", "1"),
              ("date", "20170101"),
              ("quantity", "10")]),
    OrderedDict([
              ("id", "2"),
              ("date", "20170102"),
              ("quantity", "3")]),
    OrderedDict([
              ("id", "3"),
              ("date", "20170102"),
              ("quantity", "1")])
]

借用more_itertools.map_reduce食谱:

def map_reduce(iterable, keyfunc, valuefunc=None, reducefunc=None):
    valuefunc = (lambda x: x) if (valuefunc is None) else valuefunc

    ret = defaultdict(list)
    for item in iterable:
        key = keyfunc(item)
        value = valuefunc(item)
        ret[key].append(value)

    if reducefunc is not None:
        for key, value_list in ret.items():
            ret[key] = reducefunc(value_list)

    ret.default_factory = None
    return ret

<强>代码

map_reduce使用可自定义的键和值构建defaultdict。 reduce函数应用于最终的值列表。

kfunc = lambda d: d["date"]
vfunc = lambda d: int(d["quantity"])
rfunc = lambda lst_: sum(lst_) 
agg = map_reduce(lst, keyfunc=kfunc, valuefunc=vfunc, reducefunc=rfunc)
agg
# defaultdict(None, {'20170101': 10, '20170102': 4})

我们使用列表理解来获得最终结果。

[{"date": k, "sum_quantity": v} for k, v in agg.items()]
# [{'date': '20170101', 'sum_quantity': 10}, {'date': '20170102', 'sum_quantity': 4}]