我正在尝试查找两个Excel文件之间的行数差异。我首先要在两列上对两个工作簿进行排序,然后输出具有差异的第三个文件。我无法正确导出差异文件。
任何帮助都将受到高度赞赏!!!提前致谢!
将熊猫作为pd导入
df1 = pd.DataFrame({
'ID' : ['3', '3', '55','55', '66', '66'],
'date' : [20180102, 20180103, 20180104, 20180105, 20180106, 20180107],
'age': [0, 1, 9, 4, 2, 3],
})
df2 = pd.DataFrame({
'ID' : ['3', '55', '3','66', '55', '66'],
'date' : [20180103, 20180104, 20180102, 20180106, 20180105, 20180107],
'age': [0, 1, 9, 9, 8, 7],
})
df3 = df1.sort_values(by= ['ID', 'date'] , ascending=False)
df4 = df2.sort_values(by= ['ID', 'date'] , ascending=False)
dfDiff = df3.copy()
for row in range(dfDiff.shape[0]):
for col in range(dfDiff.shape[1]):
value_old = df3.iloc[row,col]
value_new = df4.iloc[row,col]
if value_old == value_new:
dfDiff.iloc[row,col] = df4.iloc[row,col]
else:
dfDiff.iloc[row,col] = ('{}->{}').format(value_old,value_new)
writer = pd.ExcelWriter('diff', engine='xlsxwriter')
dfDiff.to_excel(writer, sheet_name='DIFF', index= False)
workbook = writer.book
worksheet = writer.sheets['DIFF']
worksheet.hide_gridlines(2)
writer.save()
答案 0 :(得分:0)
我认为您只是在文件路径末尾缺少.xlsx
df1 = pd.DataFrame({
'ID' : ['3', '3', '55','55', '66', '66'],
'date' : [20180102, 20180103, 20180104, 20180105, 20180106, 20180107],
'age': [0, 1, 9, 4, 2, 3],
})
df2 = pd.DataFrame({
'ID' : ['3', '55', '3','66', '55', '66'],
'date' : [20180103, 20180104, 20180102, 20180106, 20180105, 20180107],
'age': [0, 1, 9, 9, 8, 7],
})
df3 = df1.sort_values(by= ['ID', 'date'] , ascending=False)
df4 = df2.sort_values(by= ['ID', 'date'] , ascending=False)
dfDiff = df3.copy()
for row in range(dfDiff.shape[0]):
for col in range(dfDiff.shape[1]):
value_old = df3.iloc[row,col]
value_new = df4.iloc[row,col]
if value_old == value_new:
dfDiff.iloc[row,col] = df4.iloc[row,col]
else:
dfDiff.iloc[row,col] = ('{}->{}').format(value_old,value_new)
# added `.xlsx' to path here
writer = pd.ExcelWriter('diff.xlsx', engine='xlsxwriter')
dfDiff.to_excel(writer, sheet_name='DIFF', index= False)
workbook = writer.book
worksheet = writer.sheets['DIFF']
worksheet.hide_gridlines(2)
writer.save()