使用itertools groupby对列表进行排序并合并字典

时间:2017-02-15 20:08:48

标签: python list dictionary group-by itertools

我希望使用python intertools groupby来创建一个将小列表分组到更大列表的函数。我开始的是具有以下结构的不同数据点列表(称为sortedData)

[
  [location, date, {item:quantity}],
  [location2, date, {item2:quantity2}],
  ...
]

我正在尝试对它们进行分组,以便每个位置/日期组合都有一个包含所有项目和数量的字典,并且这些列表按位置分组。这是一个例子:

[
  [
    [Maine, 01062016, {apple:5, orange:2}],
    [Maine, 02042016,{apple:3, peach:2}]
  ],
  [
    [Vermont, 01032016, {peach:3}]
  ]
]

到目前为止我所拥有的是这段代码,但我无法理解如何使用创建的组,因为它不是可迭代的项目。现在它给出了一个空白列表,虽然它似乎应该是附加内容

def compileData(sortedData):    
    from itertools import groupby
    for key, locationGroup in groupby(sortedData, lambda x: x[0]):
        locationList=[]
        bigList=[]
        for date in locationGroup:
            locationList.append(date)
        locationList.append(locationGroup)
        for key, bigList in groupby(locationGroup, lambda x: x[1]):
            datePlace=[key[0],key[1],{}]
            for date in locationGroup:
                datePlace[2]=dict(list(date[2].items())+list(datePlace[2].items()))
                bigList.append(datePlace)
        return bigList  

让我知道您的想法,如果您对如何解决这个问题有任何更好的想法,请告诉我。我把它写成recursivley但是我使用它的文件太长了,所以它太慢了。

1 个答案:

答案 0 :(得分:1)

我认为这可以满足您的需求:

from itertools import groupby
from operator import itemgetter

def update_with_ignore(a, b):
    '''Copy only new entries from B to A'''
    for k,v in b.items():
        a.setdefault(k,v)

def compileData(sortedData):
    result = []
    sortedData = sorted(sortedData, key=itemgetter(0,1))
    for location, group in groupby(sortedData, key=itemgetter(0)):
        l = []
        for date, group in groupby(group, key=itemgetter(1)):
            d = {}
            for datum in group:
                update_with_ignore(d, datum[2])
            l.append([location, date, dict(d)])
        result.append(l)
    return result


in_data = [
    ["Maine", "01062016", {"apple":5}],
    ["Maine", "02042016", {"apple":3}],
    ["Maine", "01062016", {"orange":2}],
    ["Vermont", "01032016", {"peach":3}],
    ["Maine", "02042016", {"peach":2}],
]
out_data = compileData(in_data)
assert out_data == [
 [['Maine', '01062016', {'apple': 5, 'orange': 2}],
  ['Maine', '02042016', {'apple': 3, 'peach': 2}]],
 [['Vermont', '01032016', {'peach': 3}]]]

in_data = [
    ["Maine", "01062016", {"apple":5}],
    ["Maine", "01062016", {"apple":4}],
    ["Maine", "02042016", {"apple":3}],
]
out_data = compileData(in_data)
assert out_data == [
 [['Maine', '01062016', {'apple': 5}],
  ['Maine', '02042016', {'apple': 3}]]]