这是我的DataFrame:
client_uuid supplier_uuid order_uuid
1 a 1
1 b 2
2 a 3
1 a 4
2 b 5
2 a 6
我的目标是向DataFrame添加2列:
我想要的结果
client_uuid supplier_uuid order_uuid order_n is_repeat
1 a 1 1 f
1 b 2 2 f
2 a 3 1 f
1 a 4 3 t
2 b 5 2 f
2 a 6 3 t
我有一些基本的伪代码:
def set_new_columns(client_df):
for row in client_df:
increment order count for client
check if this order is to a supplier the client has been to before
set the new columns to the row
df.groupby("client_uuid").apply(set_new_columns)
答案 0 :(得分:1)
这应该适用于你问题的第一部分。
df['order_n'] = (df
.groupby('client_uuid')
.order_uuid
.transform(lambda group: group.notnull().cumsum()))
这应该回答第二部分:
df['first_order'] = (df
.groupby(['client_uuid', 'supplier_uuid'])
.order_uuid
.transform('first'))
df['is_repeat'] = df.order_uuid != df.first_order
>>> df
client_uuid supplier_uuid order_uuid first_order order_n is_repeat
0 1 a 1 1 1 False
1 1 b 2 2 2 False
2 2 a 3 3 1 False
3 1 a 4 1 3 True
4 2 b 5 5 2 False
5 2 a 6 3 3 True