我正在尝试创建列表列表。我有下面的数据集
ID date product
A 01/01/2018 1
A 01/01/2018 2
A 02/01/2018 2
B 01/01/2018 3
B 01/01/2018 4
B 02/01/2018 2
B 04/01/2018 1
B 04/01/2018 2
B 04/01/2018 3
目标是创建这种列表:
[[[1,2], [2]], [[3,4],[2],[1,2,3]]]
最外面的列表对应于客户ID,中间到产品的购买日期,以及最里面的产品。
答案 0 :(得分:1)
您可以使用itertools.groupby
的两个应用程序执行此操作,一个按ID分组,另一个按日期分组。
下面的代码使用了三重嵌套列表理解,它很紧凑,但不太容易阅读。我很快就会发布一个更长的版本。
from itertools import groupby
from operator import itemgetter
data = '''\
ID date product
A 01/01/2018 1
A 01/01/2018 2
A 02/01/2018 2
B 01/01/2018 3
B 01/01/2018 4
B 02/01/2018 2
B 04/01/2018 1
B 04/01/2018 2
B 04/01/2018 3
'''
data = (row.split() for row in data.splitlines())
#skip header
next(data)
result = [[[u[-1] for u in group]
for _, group in groupby(row, itemgetter(1))]
for _, row in groupby(data, itemgetter(0))]
print(result)
<强>输出强>
[[['1', '2'], ['2']], [['3', '4'], ['2'], ['1', '2', '3']]]
这是一个使用传统for
循环的版本(主要是)。它还将产品编号从字符串转换为整数。
from itertools import groupby
from operator import itemgetter
data = '''\
ID date product
A 01/01/2018 1
A 01/01/2018 2
A 02/01/2018 2
B 01/01/2018 3
B 01/01/2018 4
B 02/01/2018 2
B 04/01/2018 1
B 04/01/2018 2
B 04/01/2018 3
'''
data = (row.split() for row in data.splitlines())
#skip header
next(data)
ig1 = itemgetter(1)
result = []
for _, row in groupby(data, itemgetter(0)):
sublist = []
for _, group in groupby(row, ig1):
sublist.append([int(u[-1]) for u in group])
result.append(sublist)
print(result)
<强>输出强>
[[[1, 2], [2]], [[3, 4], [2], [1, 2, 3]]]