我正在编写一个python代码来检查2个pandas Dataframe,使用唯一的ID在两个Dataframe中查找记录并比较不同的。它需要突出显示重复的行并导出完整的记录:
例如,唯一ID = Order_id:
DF1:
id Name Order_id Product Qty
0 John 33321 Apple 1
1 Peter 22256 Orange 2
2 Mary 11112 Apple 12
DF2:
id Name Order_id Product Qty
0 John 33321 apple 12
1 Peter 22256 Orange 2
2 Mary 11112 Apple 12
3 Joe 22223 Pear 6
4 Mary 11112 Apple 12
比较后的输出应该是如下所示的Dataframe / text文件:
File 1 Header: id Name Order_id Product Qty
File 2 Header: id Name Order_id Product Qty
File 1: 0 John 33321 Apple 1
File 2: 0 John 33321 apple 12
diff: ^ ^
File 1: 2 Mary 11112 Apple 12
File 2: 2 Mary 11112 Apple 12
File 2: 4 Mary 11112 Apple 12
diff: ^
File 1: nan nan nan nan nan
File 2: 3 Joe 22223 Pear 6
diff: Missing Missing Missing Missing Missing
我能够获得不同的记录,但是,我不确定如何将它们放在Dataframe中,最终会出错:
diff_val = np.where((df1 != df2).any(1) == True)
df_result = pd.DataFrame(df1.columns)
for item in diff_val:
df_result.append(df1.iloc[item], df2.iloc[item])
我也尝试放入一个文本文件,但它无法像上面的格式那样显示。 有什么建议吗? TKS。