我需要通过合并df和df1来得出上述预期的df3,并且需要以下统计信息:
合并时要注意:如果“ Desc1”中没有该值,则应从“ Desc2”中选择该值
我有2个DF,如下所示:
df = pd.DataFrame({"Customer_no": ['1', '1', '1', '2', '2', '6', '7','8','9','10'],
"Card_no": ['111', '222', '333', '444', '555', '666', '777','888','999','000'],
"Card_name":['AAA','AAA','BBB','CCC','AAA','DDD','EEE','BBB','CCC','CCC'],
"Group_code":['123','123','456','678','123','434','678','365','678','987'],
"Amount":['100','240','450','212','432','123','543','567','232','453']})
第二个DF:
df1 = pd.DataFrame({"Group_code": ['123', '123','456', '678','678', '434', '987','421'],
"Desc1": ['Electrical', 'Electrical','Hardware', 'House', 'House', 'Car','','Toy'],
"Desc2":['Electricals111','Electricals123','Hardware112','House232','House112',
'Car','Bike','Toy']})
期望的DF:
df3 = pd.DataFrame({"Customer_no": ['1', '1', '1', '2', '2', '6', '7','8','9','10'],
"Card_no": ['111', '222', '333', '444', '555', '666', '777','888','999','000'],
"Card_name":['AAA','AAA','BBB','CCC','AAA','DDD','EEE','BBB','CCC','CCC'],
"Group_code":['123','123','456','678','123','434','678','365','678','987'],
"Amount":['100','240','450','212','432','123','543','567','232','453'],
"Category" :['Electrical','Electrical','Hardware','House','Electrical','Car','House','','House','Bike']})
答案 0 :(得分:0)
您可以先离开联接,然后使用where
df3 = df.merge(df1, how='left') # do the join
df3 = df3.rename(columns={"Desc1": "Category"})
df3 = df3.replace("", np.nan) # replace empty strings
# if Category is NaN, replace with value from Desc2
df3["Category"] = df3["Category"].where(~df3["Category"].isna(), df3["Desc2"])
df3 = df3.drop("Desc2", axis=1).drop_duplicates() # drop Desc2
Customer_no Card_no Card_name Group_code Amount Category
0 1 111 AAA 123 100 Electrical
2 1 222 AAA 123 240 Electrical
4 1 333 BBB 456 450 Hardware
5 2 444 CCC 678 212 House
7 2 555 AAA 123 432 Electrical
9 6 666 DDD 434 123 Car
10 7 777 EEE 678 543 House
12 8 888 BBB 365 567 NaN
13 9 999 CCC 678 232 House
15 10 000 CCC 987 453 Bike
答案 1 :(得分:0)
df4 = pd.merge(df, df1[['Desc1','Group_code']].drop_duplicates(), how='left', on=['Group_code'])
df4=df4[['Amount','Card_name','Card_no','Desc1','Customer_no','Group_code']] # Reordering of column sequence
df4=df4.rename(columns={'Desc1':'Category'})
df4=df4.fillna({'Category':''})
df4
Amount Card_name Card_no Category Customer_no Group_code
0 100 AAA 111 Electrical 1 123
1 240 AAA 222 Electrical 1 123
2 450 BBB 333 Hardware 1 456
3 212 CCC 444 House 2 678
4 432 AAA 555 Electrical 2 123
5 123 DDD 666 Car 6 434
6 543 EEE 777 House 7 678
7 567 BBB 888 8 365
8 232 CCC 999 House 9 678
9 453 CCC 000 Bike 10 987