我在pandas dataframe中有这个测试表
Leaf_category_id session_id product_id
0 111 1 987
3 111 4 987
4 111 1 741
1 222 2 654
2 333 3 321
我想要的是
for leaf_category_id 111:
结果应该是。
session_id product_id
1 987,741
4 987
同样可以定义一个对所有leaf_category id执行相同操作的函数,我的表包含更多行,它只是它的快照。
答案 0 :(得分:1)
您可先使用boolean indexing
,然后groupby
使用df = pd.DataFrame({'Leaf_category_id':[111,111,111,222,333],
'session_id':[1,4,1,2,3],
'product_id':[987,987,741,654,321]},
columns =['Leaf_category_id','session_id','product_id'])
print (df)
Leaf_category_id session_id product_id
0 111 1 987
1 111 4 987
2 111 1 741
3 222 2 654
4 333 3 321
print (df[df.Leaf_category_id == 111]
.groupby('session_id')['product_id']
.apply(lambda x: ','.join(x.astype(str))))
session_id
1 987,741
4 987
Name: product_id, dtype: object
:
print (df.groupby(['Leaf_category_id','session_id'])['product_id']
.apply(lambda x: ','.join(x.astype(str)))
.reset_index())
Leaf_category_id session_id product_id
0 111 1 987,741
1 111 4 987
2 222 2 654
3 333 3 321
通过评论编辑:
Leaf_category_id
或者如果需要DataFrame
for i in df.Leaf_category_id.unique():
print (df[df.Leaf_category_id == i] \
.groupby('session_id')['product_id'] \
.apply(lambda x: ','.join(x.astype(str))) \
.reset_index())
session_id product_id
0 1 987,741
1 4 987
session_id product_id
0 2 654
session_id product_id
0 3 321
中的每个唯一值:
{{1}}