我正在尝试添加一个二进制变量,该变量告诉产品是否在最后5个订单中的最后订单和订单之前的订单等。我想出了以下pandas dataframe表达式。它完全按照它应该做的那样做,但它挤压得很慢。我能做错什么?
这是我的数据框:
order_id user_id order_number product_id us_last_order_number
2539329 1 1 196 10
2539329 1 1 14084 10
2539329 1 1 12427 10
2539329 1 1 26088 10
2539329 1 1 26405 10
2398795 1 2 196 10
2398795 1 2 10258 10
2398795 1 2 12427 10
2398795 1 2 13176 10
2398795 1 2 26088 10
2398795 1 2 13032 10
473747 1 3 196 10
473747 1 3 12427 10
473747 1 3 10258 10
473747 1 3 25133 10
473747 1 3 30450 10
2254736 1 4 196 10
2254736 1 4 12427 10
2254736 1 4 10258 10
2254736 1 4 25133 10
2254736 1 4 26405 10
431534 1 5 196 10
431534 1 5 12427 10
431534 1 5 10258 10
431534 1 5 25133 10
431534 1 5 10326 10
431534 1 5 17122 10
431534 1 5 41787 10
431534 1 5 13176 10
3367565 1 6 196 10
tmp2 = priors_orders_detail.groupby(['user_id',
'product_id']).apply(lambda x: [1 if item in x.order_number.tolist() else -1 if item<0 else 0 for item in range(x.us_last_order_number.iloc[0],x.us_last_order_number.iloc[0]-5,-1)])
tmp2= pd.DataFrame(tmp2).reset_index()
tmp2.columns.values[-1]='present_in_orders' tmp2[['in_orders_1','in_orders_2',
'in_orders_3','in_orders_4',
'in_orders_5']] = pd.DataFrame([x for x in tmp2.present_in_orders]) tmp2.drop('present_in_orders',axis=1,inplace=True)