我有一个看起来像这样的数据框(实际上有35列和更多元组,但下面是相关列:
leg_side leg_quantity expiration product change_type
0 None None None ZQ inserted
1 None None None HG inserted
2 None None None PL inserted
3 None None None SI inserted
4 None None None ZQ inserted
5 None None None PL inserted
6 None None None ZW inserted
7 None None None SI inserted
8 None None None ZQ updated
9 None None None SI inserted
10 None None None ZC updated
.. ... ... ... ... ...
970 None None None OZ inserted
971 None None None OZ deleted
972 None None None OZ updated
973 None None None ZC inserted
974 None None None OZ inserted
975 None None None ZC inserted
976 None None None OZ inserted
现在我想做的是按产品分组,但不一定是SQL意义上的。我想做的是将所有元组聚合在一起,并通过change_type进行子聚合,得到这样的df:
leg_side leg_quantity expiration product change_type
0 None None None ZQ inserted
4 None None None ZQ inserted
8 None None None ZQ updated
1 None None None HG inserted
2 None None None PL inserted
5 None None None PL inserted
3 None None None SI inserted
7 None None None SI inserted
9 None None None SI inserted
6 None None None ZW inserted
...
973 None None None ZC inserted
975 None None None ZC inserted
10 None None None ZC updated
970 None None None OZ inserted
974 None None None OZ inserted
976 None None None OZ inserted
972 None None None OZ updated
971 None None None OZ deleted
组织上述数据帧,使得具有相同产品名称的所有元组在一起,然后将具有相同更改类型的那些组中的所有元组组合在一起(优选地以插入,更新,删除的顺序)。如果我做pandas groupby()那么元组将被淘汰。我只是想要一种分组的感觉。我怎么能这样做?
答案 0 :(得分:1)
您可以使用Categorical
和set自定义订单。然后groupby
数据进行排序:
df['change_type'] = df['change_type'].astype('category')
.cat
.set_categories(["inserted","updated","deleted"], ordered=True)
df = df.groupby('product').apply(lambda x: x.sort_values('change_type'))
.reset_index(drop=True)
print df
leg_side leg_quantity expiration product change_type
0 None None None HG inserted
1 None None None OZ inserted
2 None None None OZ inserted
3 None None None OZ inserted
4 None None None OZ updated
5 None None None OZ deleted
6 None None None PL inserted
7 None None None PL inserted
8 None None None SI inserted
9 None None None SI inserted
10 None None None SI inserted
11 None None None ZC inserted
12 None None None ZC inserted
13 None None None ZC updated
14 None None None ZQ inserted
15 None None None ZQ inserted
16 None None None ZQ updated
17 None None None ZW inserted