Question

我的代码如下：

import pandas as pd 

df = pd.DataFrame ({
    'IP':['1.1.1.1','2.2.2.2','3.3.3.3','4.4.4.4','5.5.5.5'],
    'ID':['101','202','303','404','505'],
    'Name':['aqua','noctua','ytube','tech','logi'],
    'Price':[100,200,300,400,500]
    })

df1 = pd.DataFrame ({
    'IP':['1.1.1.1','2.2.2.2','3.3.3.3','4.4.4.4','6.6.6.6'],
    'ID':['101','202','303','404','606'],
    'Name':['atlas','noctua','ytube','tech','smash'],
    'Price':[600,700,800,900,990]

    })
print(df)
        IP   ID    Name  Price
0  1.1.1.1  101    aqua    100
1  2.2.2.2  202  noctua    200
2  3.3.3.3  303   ytube    300
3  4.4.4.4  404    tech    400
4  5.5.5.5  505    logi    500

print(df1)
        IP   ID    Name  Price
0  1.1.1.1  101   atlas    600
1  2.2.2.2  202  noctua    700
2  3.3.3.3  303   ytube    800
3  4.4.4.4  404    tech    900
4  6.6.6.6  606   smash    990

new=df1.merge(df,indicator=True,how='left').loc[lambda x : x['_merge']=='left_only']
print(new)
        IP   ID    Name  Price     _merge
0  1.1.1.1  101   atlas    600  left_only
1  2.2.2.2  202  noctua    700  left_only
2  3.3.3.3  303   ytube    800  left_only
3  4.4.4.4  404    tech    900  left_only
4  6.6.6.6  606   smash    990  left_only

当IP和ID的组合在两个数据帧之间是唯一的时，称为 new 的新数据帧应仅包含df1中的数据（我不在乎其他列）。因此正确的输出是：

        IP   ID    Name  Price     _merge
0  6.6.6.6  606   smash    990  left_only

要获得此输出，我需要更改代码什么？谢谢。

Answer 1

您可以在pandas中使用how参数进行合并，它根据您要合并的列获取列：

new=df1.merge(df,indicator=True,how='left', on=['IP', 'ID']).loc[lambda x : x['_merge']=='left_only']

print(new)
 IP   ID Name_x  Price_x Name_y              Price_y     _merge
4  6.6.6.6  606  smash      990    NaN                  nan  left_only

如果您不通过它，熊猫会尝试根据数据帧推断出我会导致诸如此类的问题，因此我总是喜欢将其传递以防止错误，

Answer 2

您可以将IP和ID合并到str并将其与df进行比较。

new = df1.loc[~df1.IP.str.cat(df1.ID).isin(df.IP.str.cat(df1.ID))]

满足特定条件时合并两个数据帧

2 个答案: