在保留元组的同时在Pandas中进行分组

时间:2016-01-20 17:14:14

标签: python pandas

我有一个看起来像这样的数据框(实际上有35列和更多元组,但下面是相关列:

     leg_side  leg_quantity expiration product  change_type  
0        None          None       None      ZQ     inserted  
1        None          None       None      HG     inserted  
2        None          None       None      PL     inserted  
3        None          None       None      SI     inserted  
4        None          None       None      ZQ     inserted  
5        None          None       None      PL     inserted  
6        None          None       None      ZW     inserted  
7        None          None       None      SI     inserted  
8        None          None       None      ZQ     updated  
9        None          None       None      SI     inserted  
10       None          None       None      ZC     updated
..        ...           ...        ...     ...          ...  
970      None          None       None      OZ     inserted  
971      None          None       None      OZ     deleted  
972      None          None       None      OZ     updated  
973      None          None       None      ZC     inserted  
974      None          None       None      OZ     inserted  
975      None          None       None      ZC     inserted  
976      None          None       None      OZ     inserted

现在我想做的是按产品分组,但不一定是SQL意义上的。我想做的是将所有元组聚合在一起,并通过change_type进行子聚合,得到这样的df:

     leg_side  leg_quantity expiration product  change_type  
0        None          None       None      ZQ     inserted
4        None          None       None      ZQ     inserted
8        None          None       None      ZQ     updated 
1        None          None       None      HG     inserted
2        None          None       None      PL     inserted
5        None          None       None      PL     inserted
3        None          None       None      SI     inserted
7        None          None       None      SI     inserted
9        None          None       None      SI     inserted
6        None          None       None      ZW     inserted
...
973      None          None       None      ZC     inserted
975      None          None       None      ZC     inserted
10       None          None       None      ZC     updated
970      None          None       None      OZ     inserted
974      None          None       None      OZ     inserted
976      None          None       None      OZ     inserted
972      None          None       None      OZ     updated
971      None          None       None      OZ     deleted

组织上述数据帧,使得具有相同产品名称的所有元组在一起,然后将具有相同更改类型的那些组中的所有元组组合在一起(优选地以插入,更新,删除的顺序)。如果我做pandas groupby()那么元组将被淘汰。我只是想要一种分组的感觉。我怎么能这样做?

1 个答案:

答案 0 :(得分:1)

您可以使用Categoricalset自定义订单。然后groupby数据进行排序:

df['change_type'] = df['change_type'].astype('category')
                                     .cat
                                     .set_categories(["inserted","updated","deleted"], ordered=True)

df = df.groupby('product').apply(lambda x: x.sort_values('change_type'))
                          .reset_index(drop=True)
print df

   leg_side leg_quantity expiration product change_type
0      None         None       None      HG    inserted
1      None         None       None      OZ    inserted
2      None         None       None      OZ    inserted
3      None         None       None      OZ    inserted
4      None         None       None      OZ     updated
5      None         None       None      OZ     deleted
6      None         None       None      PL    inserted
7      None         None       None      PL    inserted
8      None         None       None      SI    inserted
9      None         None       None      SI    inserted
10     None         None       None      SI    inserted
11     None         None       None      ZC    inserted
12     None         None       None      ZC    inserted
13     None         None       None      ZC     updated
14     None         None       None      ZQ    inserted
15     None         None       None      ZQ    inserted
16     None         None       None      ZQ     updated
17     None         None       None      ZW    inserted