如何使用多索引创建具有聚合值的新列?
例如,在以下DataFrame中,我如何使用基于区域,市场的索引的聚合产品列表创建新列?
import pandas as pd
df = pd.DataFrame({'product' : ['Alpha', 'Alpha', 'Beta', 'Beta', 'Omega', 'Omega', 'Delta', 'Delta'],
'region' : [1, 2, 1, 1, 3, 1, 2, 1],
'market' : ['small', 'large', 'small', 'small', 'large', 'small', 'small', 'medium']})
希望从:
+---------+--------+--------+ | product | region | market | +---------+--------+--------+ | Alpha | 1 | small | | Alpha | 2 | large | | Beta | 1 | small | | Beta | 1 | small | | Omega | 3 | large | | Omega | 1 | small | | Delta | 2 | small | | Delta | 1 | medium | +---------+--------+--------+
要:
+---------+--------+--------+----------------------------+ | product | region | market | product_list | +---------+--------+--------+----------------------------+ | Alpha | 1 | small | ['Alpha', 'Beta', 'Omega'] | | Alpha | 2 | large | ['Alpha'] | | Beta | 1 | small | ['Alpha', 'Beta', 'Omega'] | | Beta | 1 | small | ['Alpha', 'Beta', 'Omega'] | | Omega | 3 | large | ['Omega'] | | Omega | 1 | small | ['Alpha', 'Beta', 'Omega'] | | Delta | 2 | small | ['Delta'] | | Delta | 1 | medium | ['Delta'] | +---------+--------+--------+----------------------------+
重复丢弃(例如Beta,1,小而不是在product_list中复制[&#39; Alpha&#39;,&#39; Beta&#39;,&#39; Omega&#39;] < / p>
答案 0 :(得分:0)
IIUC您可以将groupby
与transform
一起使用,通过转换为set
删除重复项,然后使用apply
转换为list
:
df['product_list'] = df.groupby(['region','market']).transform(lambda x:
set(x.tolist()))
df['product_list'] = df['product_list'].apply(lambda x: list(x))
print df
market product region product_list
0 small Alpha 1 [Alpha, Beta, Omega]
1 large Alpha 2 [Alpha]
2 small Beta 1 [Alpha, Beta, Omega]
3 small Beta 1 [Alpha, Beta, Omega]
4 large Omega 3 [Omega]
5 small Omega 1 [Alpha, Beta, Omega]
6 small Delta 2 [Delta]
7 medium Delta 1 [Delta]