如何在大熊猫中脱离多种条件?

时间:2019-08-13 22:51:01

标签: sql pandas join left-join

我试图以某种方式将现有的SQL语句转换为pandas。 这些是我正在使用的数据框:

df_products:

ID  PRODUCT_ID        NAME  STOCK  SELL_COUNT DELIVERED_BY        
1         P1  PRODUCT_P1     12          15          UPS  
2         P2  PRODUCT_P2      4           3          DHL  
3         P3  PRODUCT_P3    120          22          DHL  
4         P1  PRODUCT_P1    423          18          UPS  
5         P2  PRODUCT_P2      0           5          GLS  
6         P3  PRODUCT_P3     53          10          DHL  
7         P4  PRODUCT_P4     22           0          UPS  
8         P1  PRODUCT_P1     94          56          GLS  
9         P1  PRODUCT_P1      9          24          GLS

df_accessories:

ID ACCESSORY_ID         NAME DEL_BY SUITABLE_FOR MANUFACTURER
100           A1  ACCESSORY_1    DHL           P1         KUNG
101           A2  ACCESSORY_2    UPS           P1          PAO
102           A3  ACCESSORY_3    GLS           P1          PAO
103           A4  ACCESSORY_4    UPS           P3          PAK
104           A5  ACCESSORY_5    DHL           P2          PAK

我正在尝试应用此SQL查询的熊猫版本:

SELECT *
FROM products a
LEFT JOIN accessories b
    ON b.DEL_BY = 'UPS'
    AND a.PRODUCT_ID = b.SUITABLE_FOR
    AND b.MANUFACTURER != 'PAK'

我试图这样解决:

joined = df_products.merge(df_accessories, left_on='PRODUCT_ID', right_on='SUITABLE_FOR', how='left')
filtered = joined.loc[(joined['DEL_BY'] == 'UPS') & (joined['MANUFACTURER'] != 'PAK')]

但是我不认为这种方式有效。我已经在用第一个ON b.DEL_BY ='UPS'语句挣扎了,我不知道将它放在大熊猫合并功能中的什么位置。

我期望这个结果:

   ID PRODUCT_ID        NAME  STOCK  SELL_COUNT DELIVERED_BY  ďťżID ACCESSORY_ID       NAME.1 DEL_BY SUITABLE_FOR MANUFACTURER
0   1         P1  PRODUCT_P1     12          15          UPS  101.0           A2  ACCESSORY_2    UPS           P1          PAO
1   2         P2  PRODUCT_P2      4           3          DHL    NaN          NaN          NaN    NaN          NaN          NaN
2   3         P3  PRODUCT_P3    120          22          DHL    NaN          NaN          NaN    NaN          NaN          NaN
3   4         P1  PRODUCT_P1    423          18          UPS  101.0           A2  ACCESSORY_2    UPS           P1          PAO
4   5         P2  PRODUCT_P2      0           5          GLS    NaN          NaN          NaN    NaN          NaN          NaN
5   6         P3  PRODUCT_P3     53          10          DHL    NaN          NaN          NaN    NaN          NaN          NaN
6   7         P4  PRODUCT_P4     22           0          UPS    NaN          NaN          NaN    NaN          NaN          NaN
7   8         P1  PRODUCT_P1     94          56          GLS  101.0           A2  ACCESSORY_2    UPS           P1          PAO
8   9         P1  PRODUCT_P1      9          24          GLS  101.0           A2  ACCESSORY_2    UPS           P1          PAO

但是我却得到了这个:

    ID_x PRODUCT_ID      NAME_x  STOCK  SELL_COUNT DELIVERED_BY   ID_y ACCESSORY_ID       NAME_y DEL_BY SUITABLE_FOR MANUFACTURER
1      1         P1  PRODUCT_P1     12          15          UPS  101.0           A2  ACCESSORY_2    UPS           P1          PAO
6      4         P1  PRODUCT_P1    423          18          UPS  101.0           A2  ACCESSORY_2    UPS           P1          PAO
12     8         P1  PRODUCT_P1     94          56          GLS  101.0           A2  ACCESSORY_2    UPS           P1          PAO
15     9         P1  PRODUCT_P1      9          24          GLS  101.0           A2  ACCESSORY_2    UPS           P1          PAO

谢谢

2 个答案:

答案 0 :(得分:2)

您要在合并之前过滤正确的数据框:

df_products.merge(df_accessories.query('DEL_BY == "UPS" and MANUFACTURER != "PAK"'),
                  left_on='PRODUCT_ID', right_on='SUITABLE_FOR', how='left',
                  suffixes=('', '.1'))

.query(...)片段相当于对数据帧进行切片:

cond = (df_accessories['DEL_BY'] == 'UPS') & (df_accessories['MANUFACTURER'] != 'PAK')
df_products.merge(df_accessories[cond], ...)

答案 1 :(得分:2)

我会这样做,首先根据您在联接中不加入df_product的条件过滤df_accessory,然后像这样使用merge联接到df_product:

df_accessory.query('MANUFACTURER != "PAK" and DEL_BY == "UPS"').merge(df_product, 
                                                                      right_on  = 'PRODUCT_ID',
                                                                      left_on = 'SUITABLE_FOR', how='right')\
            .sort_values('ID_y')

输出:

    ID_x ACCESSORY_ID       NAME_x DEL_BY SUITABLE_FOR MANUFACTURER  ID_y PRODUCT_ID      NAME_y  STOCK  SELL_COUNT DELIVERED_BY
0  101.0           A2  ACCESSORY_2    UPS           P1          PAO     1         P1  PRODUCT_P1     12          15          UPS
4    NaN          NaN          NaN    NaN          NaN          NaN     2         P2  PRODUCT_P2      4           3          DHL
6    NaN          NaN          NaN    NaN          NaN          NaN     3         P3  PRODUCT_P3    120          22          DHL
1  101.0           A2  ACCESSORY_2    UPS           P1          PAO     4         P1  PRODUCT_P1    423          18          UPS
5    NaN          NaN          NaN    NaN          NaN          NaN     5         P2  PRODUCT_P2      0           5          GLS
7    NaN          NaN          NaN    NaN          NaN          NaN     6         P3  PRODUCT_P3     53          10          DHL
8    NaN          NaN          NaN    NaN          NaN          NaN     7         P4  PRODUCT_P4     22           0          UPS
2  101.0           A2  ACCESSORY_2    UPS           P1          PAO     8         P1  PRODUCT_P1     94          56          GLS
3  101.0           A2  ACCESSORY_2    UPS           P1          PAO     9         P1  PRODUCT_P1      9          24          GLS