我正在分析客户退货行为,并正在使用以下数据框df:
Customer_ID | Order | Store_ID | Date | Item_ID | Count_of_units | Event_Return_Flag
ABC 123 1 23052016 A -1 Y
ABC 345 1 23052016 B 1 0
ABC 567 1 24052016 C -1 0
我需要添加另一列来查找在活动期间返回的客户(Event_Return_Flag = Y)并在同一天购买商品并存储。
换句话说,我想用以下逻辑添加一个标志df ['target']:
我不知道如何在python pandas中完成此任务。
我想通过连接Customer_ID,Store_ID和Date来创建密钥;然后通过Event_Return_flag分割文件并使用isin语句,如下所示:
df['key']=df['Customer_ID']+'_'+df['Store_ID']+'_'+df['Date'].apply(str)
df_1 = df.loc[df['Event_Return_Flag'] == 'Y']
df_2 = df.loc[df['Event_Return_Flag'] == '0']
df_3 = df2.loc[df['Count_of_units'] > 0]
df3['target'] = np.where(df3['key'].isin(df1['key']), 'Y', 0)
这种做法似乎有些错误,但我无法想出更好的东西。我在np.where:
的最后一行收到此错误消息C:\Users\xxx\AppData\Local\Continuum\Anaconda2\lib\site-packages\ipykernel\__main__.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
if __name__ == '__main__':
我尝试了这一行,但无法弄清楚如何根据列Event_Return_Flag匹配行
df['target'] = (np.where((df.Item_Units_S > 0)&(df.groupby(['key','Item_ID']).Event_Return_flag.transform('nunique') > 1), 'Y', ''))