我希望使用python intertools groupby来创建一个将小列表分组到更大列表的函数。我开始的是具有以下结构的不同数据点列表(称为sortedData)
[
[location, date, {item:quantity}],
[location2, date, {item2:quantity2}],
...
]
我正在尝试对它们进行分组,以便每个位置/日期组合都有一个包含所有项目和数量的字典,并且这些列表按位置分组。这是一个例子:
[
[
[Maine, 01062016, {apple:5, orange:2}],
[Maine, 02042016,{apple:3, peach:2}]
],
[
[Vermont, 01032016, {peach:3}]
]
]
到目前为止我所拥有的是这段代码,但我无法理解如何使用创建的组,因为它不是可迭代的项目。现在它给出了一个空白列表,虽然它似乎应该是附加内容
def compileData(sortedData):
from itertools import groupby
for key, locationGroup in groupby(sortedData, lambda x: x[0]):
locationList=[]
bigList=[]
for date in locationGroup:
locationList.append(date)
locationList.append(locationGroup)
for key, bigList in groupby(locationGroup, lambda x: x[1]):
datePlace=[key[0],key[1],{}]
for date in locationGroup:
datePlace[2]=dict(list(date[2].items())+list(datePlace[2].items()))
bigList.append(datePlace)
return bigList
让我知道您的想法,如果您对如何解决这个问题有任何更好的想法,请告诉我。我把它写成recursivley但是我使用它的文件太长了,所以它太慢了。
答案 0 :(得分:1)
我认为这可以满足您的需求:
from itertools import groupby
from operator import itemgetter
def update_with_ignore(a, b):
'''Copy only new entries from B to A'''
for k,v in b.items():
a.setdefault(k,v)
def compileData(sortedData):
result = []
sortedData = sorted(sortedData, key=itemgetter(0,1))
for location, group in groupby(sortedData, key=itemgetter(0)):
l = []
for date, group in groupby(group, key=itemgetter(1)):
d = {}
for datum in group:
update_with_ignore(d, datum[2])
l.append([location, date, dict(d)])
result.append(l)
return result
in_data = [
["Maine", "01062016", {"apple":5}],
["Maine", "02042016", {"apple":3}],
["Maine", "01062016", {"orange":2}],
["Vermont", "01032016", {"peach":3}],
["Maine", "02042016", {"peach":2}],
]
out_data = compileData(in_data)
assert out_data == [
[['Maine', '01062016', {'apple': 5, 'orange': 2}],
['Maine', '02042016', {'apple': 3, 'peach': 2}]],
[['Vermont', '01032016', {'peach': 3}]]]
in_data = [
["Maine", "01062016", {"apple":5}],
["Maine", "01062016", {"apple":4}],
["Maine", "02042016", {"apple":3}],
]
out_data = compileData(in_data)
assert out_data == [
[['Maine', '01062016', {'apple': 5}],
['Maine', '02042016', {'apple': 3}]]]