如何在groupby中向DataFrame添加列

时间:2016-04-28 02:12:55

标签: python pandas

这是我的DataFrame:

client_uuid supplier_uuid order_uuid
1           a             1
1           b             2
2           a             3
1           a             4
2           b             5
2           a             6

我的目标是向DataFrame添加2列:

  • 订单号:此客户已达到此订单的订单数
  • 重复订单:如果此订单是针对客户过去购买的任何供应商

我想要的结果

client_uuid supplier_uuid order_uuid order_n is_repeat
1           a             1          1       f
1           b             2          2       f  
2           a             3          1       f
1           a             4          3       t
2           b             5          2       f
2           a             6          3       t

我有一些基本的伪代码:

def set_new_columns(client_df):
  for row in client_df:
      increment order count for client
      check if this order is to a supplier the client has been to before
      set the new columns to the row

df.groupby("client_uuid").apply(set_new_columns)

1 个答案:

答案 0 :(得分:1)

这应该适用于你问题的第一部分。

df['order_n'] = (df
                 .groupby('client_uuid')
                 .order_uuid
                 .transform(lambda group: group.notnull().cumsum()))

这应该回答第二部分:

df['first_order'] = (df
                     .groupby(['client_uuid', 'supplier_uuid'])
                     .order_uuid
                     .transform('first'))
df['is_repeat'] = df.order_uuid != df.first_order

>>> df
   client_uuid supplier_uuid  order_uuid  first_order  order_n is_repeat
0            1             a           1            1        1     False
1            1             b           2            2        2     False
2            2             a           3            3        1     False
3            1             a           4            1        3      True
4            2             b           5            5        2     False
5            2             a           6            3        3      True