我试图以某种方式将现有的SQL语句转换为pandas。 这些是我正在使用的数据框:
df_products:
ID PRODUCT_ID NAME STOCK SELL_COUNT DELIVERED_BY
1 P1 PRODUCT_P1 12 15 UPS
2 P2 PRODUCT_P2 4 3 DHL
3 P3 PRODUCT_P3 120 22 DHL
4 P1 PRODUCT_P1 423 18 UPS
5 P2 PRODUCT_P2 0 5 GLS
6 P3 PRODUCT_P3 53 10 DHL
7 P4 PRODUCT_P4 22 0 UPS
8 P1 PRODUCT_P1 94 56 GLS
9 P1 PRODUCT_P1 9 24 GLS
和
df_accessories:
ID ACCESSORY_ID NAME DEL_BY SUITABLE_FOR MANUFACTURER
100 A1 ACCESSORY_1 DHL P1 KUNG
101 A2 ACCESSORY_2 UPS P1 PAO
102 A3 ACCESSORY_3 GLS P1 PAO
103 A4 ACCESSORY_4 UPS P3 PAK
104 A5 ACCESSORY_5 DHL P2 PAK
我正在尝试应用此SQL查询的熊猫版本:
SELECT *
FROM products a
LEFT JOIN accessories b
ON b.DEL_BY = 'UPS'
AND a.PRODUCT_ID = b.SUITABLE_FOR
AND b.MANUFACTURER != 'PAK'
我试图这样解决:
joined = df_products.merge(df_accessories, left_on='PRODUCT_ID', right_on='SUITABLE_FOR', how='left')
filtered = joined.loc[(joined['DEL_BY'] == 'UPS') & (joined['MANUFACTURER'] != 'PAK')]
但是我不认为这种方式有效。我已经在用第一个ON b.DEL_BY ='UPS'语句挣扎了,我不知道将它放在大熊猫合并功能中的什么位置。
我期望这个结果:
ID PRODUCT_ID NAME STOCK SELL_COUNT DELIVERED_BY ďťżID ACCESSORY_ID NAME.1 DEL_BY SUITABLE_FOR MANUFACTURER
0 1 P1 PRODUCT_P1 12 15 UPS 101.0 A2 ACCESSORY_2 UPS P1 PAO
1 2 P2 PRODUCT_P2 4 3 DHL NaN NaN NaN NaN NaN NaN
2 3 P3 PRODUCT_P3 120 22 DHL NaN NaN NaN NaN NaN NaN
3 4 P1 PRODUCT_P1 423 18 UPS 101.0 A2 ACCESSORY_2 UPS P1 PAO
4 5 P2 PRODUCT_P2 0 5 GLS NaN NaN NaN NaN NaN NaN
5 6 P3 PRODUCT_P3 53 10 DHL NaN NaN NaN NaN NaN NaN
6 7 P4 PRODUCT_P4 22 0 UPS NaN NaN NaN NaN NaN NaN
7 8 P1 PRODUCT_P1 94 56 GLS 101.0 A2 ACCESSORY_2 UPS P1 PAO
8 9 P1 PRODUCT_P1 9 24 GLS 101.0 A2 ACCESSORY_2 UPS P1 PAO
但是我却得到了这个:
ID_x PRODUCT_ID NAME_x STOCK SELL_COUNT DELIVERED_BY ID_y ACCESSORY_ID NAME_y DEL_BY SUITABLE_FOR MANUFACTURER
1 1 P1 PRODUCT_P1 12 15 UPS 101.0 A2 ACCESSORY_2 UPS P1 PAO
6 4 P1 PRODUCT_P1 423 18 UPS 101.0 A2 ACCESSORY_2 UPS P1 PAO
12 8 P1 PRODUCT_P1 94 56 GLS 101.0 A2 ACCESSORY_2 UPS P1 PAO
15 9 P1 PRODUCT_P1 9 24 GLS 101.0 A2 ACCESSORY_2 UPS P1 PAO
谢谢
答案 0 :(得分:2)
您要在合并之前过滤正确的数据框:
df_products.merge(df_accessories.query('DEL_BY == "UPS" and MANUFACTURER != "PAK"'),
left_on='PRODUCT_ID', right_on='SUITABLE_FOR', how='left',
suffixes=('', '.1'))
.query(...)
片段相当于对数据帧进行切片:
cond = (df_accessories['DEL_BY'] == 'UPS') & (df_accessories['MANUFACTURER'] != 'PAK')
df_products.merge(df_accessories[cond], ...)
答案 1 :(得分:2)
我会这样做,首先根据您在联接中不加入df_product的条件过滤df_accessory,然后像这样使用merge联接到df_product:
df_accessory.query('MANUFACTURER != "PAK" and DEL_BY == "UPS"').merge(df_product,
right_on = 'PRODUCT_ID',
left_on = 'SUITABLE_FOR', how='right')\
.sort_values('ID_y')
输出:
ID_x ACCESSORY_ID NAME_x DEL_BY SUITABLE_FOR MANUFACTURER ID_y PRODUCT_ID NAME_y STOCK SELL_COUNT DELIVERED_BY
0 101.0 A2 ACCESSORY_2 UPS P1 PAO 1 P1 PRODUCT_P1 12 15 UPS
4 NaN NaN NaN NaN NaN NaN 2 P2 PRODUCT_P2 4 3 DHL
6 NaN NaN NaN NaN NaN NaN 3 P3 PRODUCT_P3 120 22 DHL
1 101.0 A2 ACCESSORY_2 UPS P1 PAO 4 P1 PRODUCT_P1 423 18 UPS
5 NaN NaN NaN NaN NaN NaN 5 P2 PRODUCT_P2 0 5 GLS
7 NaN NaN NaN NaN NaN NaN 6 P3 PRODUCT_P3 53 10 DHL
8 NaN NaN NaN NaN NaN NaN 7 P4 PRODUCT_P4 22 0 UPS
2 101.0 A2 ACCESSORY_2 UPS P1 PAO 8 P1 PRODUCT_P1 94 56 GLS
3 101.0 A2 ACCESSORY_2 UPS P1 PAO 9 P1 PRODUCT_P1 9 24 GLS