我想比较两个数据框并有选择地打印出我的差异。这是我要在图片中完成的操作:
数据框1
数据框2
所需的输出-数据帧3
到目前为止我尝试过什么?
import pandas as pd
import numpy as np
df1 = pd.read_excel("01.xlsx")
df2 = pd.read_excel("02.xlsx")
def diff_pd(df1, df2):
"""Identify differences between two pandas DataFrames"""
assert (df1.columns == df2.columns).all(), \
"DataFrame column names are different"
if any(df1.dtypes != df2.dtypes):
"Data Types are different, trying to convert"
df2 = df2.astype(df1.dtypes)
if df1.equals(df2):
return None
else: # need to account for np.nan != np.nan returning True
diff_mask = (df1 != df2) & ~(df1.isnull() & df2.isnull())
ne_stacked = diff_mask.stack()
changed = ne_stacked[ne_stacked]
changed.index.names = ['id', 'Naziv usluge']
difference_locations = np.where(diff_mask)
changed_from = df1.values[difference_locations]
changed_to = df2.values[difference_locations]
return pd.DataFrame({'Service Previous': changed_from, 'Service Current': changed_to},
index=changed.index)
df3 = diff_pd(df1, df2)
df3 = df3.fillna(0)
df3 = df3.reset_index()
print(df3)
为了公平起见,我在另一个线程上发现了该代码,但是确实可以完成工作,但是仍然存在一些问题。
谢谢!
答案 0 :(得分:0)
从...开始更容易些吗?
尝试
import pandas as pd
data1={'Name':['Tom','Bob','Mary'],'Age':[20,30,40],'Pay':[10,10,20]}
data2={'Name':['Tom','Bob','Mary'],'Age':[40,30,20]}
df1=pd.DataFrame.from_records(data1)
df2=pd.DataFrame.from_records(data2)
# Checking Columns
for col in df1.columns:
if col not in df2.columns:
print(f"DF2 Missing Col {col}")
# Check Col Values
for col in df1.columns:
if col in df2.columns:
# Ok we have the same column
if list(df1[col]) == list(df2[col]):
print(f"Columns {col} are the same")
else:
print(f"Columns {col} have differences")
它应该输出
DF2 Missing Col Pay
Columns Age have differences
Columns Name are the same
Python3.7需要或更改f字符串格式。