我有两个xlsx
文件,如下所示:
value1 value2 value3
0.456 3.456 0.4325436
6.24654 0.235435 6.376546
4.26545 4.264543 7.2564523
和
value1 value2 value3
0.456 3.456 0.4325436
6.24654 0.23546 6.376546
4.26545 4.264543 7.2564523
我需要比较所有细胞,以及file1 !=
来自file2
print
的细胞的细胞。
import xlrd
rb = xlrd.open_workbook('file1.xlsx')
rb1 = xlrd.open_workbook('file2.xlsx')
sheet = rb.sheet_by_index(0)
for rownum in range(sheet.nrows):
row = sheet.row_values(rownum)
for c_el in row:
print c_el
如何添加file1
和file2
?
答案 0 :(得分:17)
使用pandas
,你可以这样简单地完成:
import pandas as pd
df1 = pd.read_excel('excel1.xlsx')
df2 = pd.read_excel('excel2.xlsx')
difference = df1[df1!=df2]
print difference
结果将如下所示:
答案 1 :(得分:6)
以下方法可以帮助您入门:
from itertools import izip_longest
import xlrd
rb1 = xlrd.open_workbook('file1.xlsx')
rb2 = xlrd.open_workbook('file2.xlsx')
sheet1 = rb1.sheet_by_index(0)
sheet2 = rb2.sheet_by_index(0)
for rownum in range(max(sheet1.nrows, sheet2.nrows)):
if rownum < sheet1.nrows:
row_rb1 = sheet1.row_values(rownum)
row_rb2 = sheet2.row_values(rownum)
for colnum, (c1, c2) in enumerate(izip_longest(row_rb1, row_rb2)):
if c1 != c2:
print "Row {} Col {} - {} != {}".format(rownum+1, colnum+1, c1, c2)
else:
print "Row {} missing".format(rownum+1)
这将显示两个文件之间不同的任何单元格。对于您给定的两个文件,将显示:
Row 3 Col 2 - 0.235435 != 0.23546
答案 2 :(得分:2)
我使用代码执行类似的操作。有点一般化的作品。 Input Excel Sheets and expected Output Dataframe image
import pandas as pd
import numpy as np
from xlsxwriter.utility import xl_rowcol_to_cell
template = pd.read_excel("template.xlsx",na_values=np.nan,header=None)
testSheet = pd.read_excel("test.xlsx",na_values=np.nan,header=None)
rt,ct = template.shape
rtest,ctest = testSheet.shape
df = pd.DataFrame(columns=['Cell_Location','BaseTemplate_Value','CurrentFile_Value'])
for rowNo in range(max(rt,rtest)):
for colNo in range(max(ct,ctest)):
# Fetching the template value at a cell
try:
template_val = template.iloc[rowNo,colNo]
except:
template_val = np.nan
# Fetching the testsheet value at a cell
try:
testSheet_val = testSheet.iloc[rowNo,colNo]
except:
testSheet_val = np.nan
# Comparing the values
if (str(template_val)!=str(testSheet_val)):
cell = xl_rowcol_to_cell(rowNo, colNo)
dfTemp = pd.DataFrame([[cell,template_val,testSheet_val]],
columns=['Cell_Location','BaseTemplate_Value','CurrentFile_Value'])
df = df.append(dfTemp)
df是必需的数据帧
答案 3 :(得分:0)
df_file1 = pd.read_csv("Source_data.csv")
df_file1 = df_file1.replace(np.nan, '', regex=True) #Replacing Nan with space
df_file2 = pd.read_csv("Target_data.csv")
df_file2 = df_file2.replace(np.nan, '', regex=True)
df_i = pd.concat([df_file1, df_file2], axis='columns', keys=['file1', 'file2'])
df_f = df_i.swaplevel(axis='columns')[df_s.columns[0:]]
def highlight_diff(data, color='yellow'):
attr = 'background-color: {}'.format(color)
other = data.xs('file1', axis='columns', level=-1)
return pd.DataFrame(np.where(data.ne(other, level=0), attr, ''), index=data.index, columns=data.columns)
df_final = df_f.style.apply(highlight_diff, axis=None)
writer = pd.ExcelWriter('comparexcels.xlsx')
df_final.to_excel(writer)
writer.save()
Below Picture shows the output of file highlighting the differences