从pandas df列中选择的函数

时间:2016-11-14 11:41:33

标签: python pandas numpy

我在pandas dataframe中有这个测试表

   Leaf_category_id  session_id  product_id
0               111           1         987
3               111           4         987
4               111           1         741
1               222           2         654
2               333           3         321

enter image description here

我想要的是

for leaf_category_id 111:

结果应该是。

 session_id   product_id
 1            987,741
 4            987

同样可以定义一个对所有leaf_category id执行相同操作的函数,我的表包含更多行,它只是它的快照。

1 个答案:

答案 0 :(得分:1)

您可先使用boolean indexing,然后groupby使用df = pd.DataFrame({'Leaf_category_id':[111,111,111,222,333], 'session_id':[1,4,1,2,3], 'product_id':[987,987,741,654,321]}, columns =['Leaf_category_id','session_id','product_id']) print (df) Leaf_category_id session_id product_id 0 111 1 987 1 111 4 987 2 111 1 741 3 222 2 654 4 333 3 321 print (df[df.Leaf_category_id == 111] .groupby('session_id')['product_id'] .apply(lambda x: ','.join(x.astype(str)))) session_id 1 987,741 4 987 Name: product_id, dtype: object

print (df.groupby(['Leaf_category_id','session_id'])['product_id']
         .apply(lambda x: ','.join(x.astype(str)))
         .reset_index())
   Leaf_category_id  session_id product_id
0               111           1    987,741
1               111           4        987
2               222           2        654
3               333           3        321

通过评论编辑:

Leaf_category_id

或者如果需要DataFrame for i in df.Leaf_category_id.unique(): print (df[df.Leaf_category_id == i] \ .groupby('session_id')['product_id'] \ .apply(lambda x: ','.join(x.astype(str))) \ .reset_index()) session_id product_id 0 1 987,741 1 4 987 session_id product_id 0 2 654 session_id product_id 0 3 321 中的每个唯一值:

{{1}}