Question

我正在编写一个python代码来检查2个pandas Dataframe，使用唯一的ID在两个Dataframe中查找记录并比较不同的。它需要突出显示重复的行并导出完整的记录：

例如，唯一ID = Order_id：

DF1：

id  Name    Order_id  Product   Qty
0   John    33321     Apple     1
1   Peter   22256     Orange    2
2   Mary    11112     Apple     12

DF2：

id  Name    Order_id   Product  Qty
0   John    33321      apple    12
1   Peter   22256      Orange   2
2   Mary    11112      Apple    12
3   Joe     22223      Pear     6
4   Mary    11112      Apple    12

比较后的输出应该是如下所示的Dataframe / text文件：

File 1 Header:  id  Name    Order_id    Product Qty
File 2 Header:  id  Name    Order_id    Product Qty

File 1:         0   John    33321      Apple    1
File 2:         0   John    33321      apple    12
diff:                                  ^         ^

File 1:         2   Mary    11112      Apple    12
File 2:         2   Mary    11112      Apple    12
File 2:         4   Mary    11112      Apple    12
diff:           ^               

File 1:         nan  nan    nan        nan      nan
File 2:         3    Joe    22223      Pear     6
diff:          Missing Missing  Missing Missing Missing

我能够获得不同的记录，但是，我不确定如何将它们放在Dataframe中，最终会出错：

diff_val = np.where((df1 != df2).any(1) == True)
df_result = pd.DataFrame(df1.columns)
for item in diff_val:
    df_result.append(df1.iloc[item], df2.iloc[item])

我也尝试放入一个文本文件，但它无法像上面的格式那样显示。有什么建议吗？ TKS。

比较并突出显示2个pandas数据帧中的不同内容

0 个答案: