为什么这个diff函数将空单元格视为不同?

时间:2017-06-20 14:49:11

标签: excel python-3.x pandas dataframe diff

def find_diffs(dataframe1, dataframe2):  # Finds diff cells and stores to list
x_ofs = dataframe1.columns.nlevels + 1
y_ofs = dataframe1.index.nlevels + 1
return([column_letter(x + x_ofs) + str(y + y_ofs) for
        y, x in zip(*np.where(dataframe1 != dataframe2))])

我制作一个Python脚本来区分2个Excel文件并突出显示不同的单元格。我使用的是Pandas数据帧。这个功能的问题在于它突出显示空单元格,就好像它们不同。我尝试过以下几件事:

 (dataframe1 != dataframe2) and dataframe2 != ''
 (dataframe1 != dataframe2) and dataframe2 != 'nan'
 (dataframe1 != dataframe2) & dataframe2 != nan

我尝试了更多的东西,但这只是一些例子。我还尝试使用类似的功能来检测空单元格,然后将它们从它认为不同的单元格列表中删除,但我无法使其工作。

附带问题:有没有办法让它忽略区分大小写?当字母大小写不同时,它还会突出显示单元格

更多代码:

df1 = pd.read_excel(mxln, header=None)  # Loads master xlsx for comparison
df2 = pd.read_excel(sfcn, header=None)  # Loads student xlsx for comparison
df3 = df2.to_excel('TEACHER COPY ' '[' + sname + '].xlsx')
# difference = df2[df2 != df1]  # Scans for differences
# print(difference)

def find_diffs(dataframe1, dataframe2):  # Finds diff cells and stores to list
    x_ofs = dataframe1.columns.nlevels + 1
    y_ofs = dataframe1.index.nlevels + 1
    return([column_letter(x + x_ofs) + str(y + y_ofs) for
            y, x in zip(*np.where(dataframe1 != dataframe2) & (dataframe2.notnull()))])



# print(find_diffs(df1, df2))

# find_diffs(df1, df2)
# print(find_diffs(df1, df2))

test0 = 'TEACHER COPY ' '[' + sname + '].xlsx'
test = load_workbook(test0)
test1 = test.active
test2 = test.save(test0)
test3 = test1
# test4 = test.active

def color_red():
    redFill = PatternFill(start_color='FFEE1111', end_color='FFEE1111', fill_type='solid')
    for cell in find_diffs(df1, df2):  # find_diffs(df1, df2)
        # print(cell)
        test3[cell].fill = redFill
        test.save(test0)
        #return(color_red)  # Leave commented otherwise only colors 1st cell in list

color_red()

def count_red():
    errors = str(len(find_diffs(df1, df2)))
    # print(errors)
    return(errors)

def write_errors():
    wb = load_workbook(filename=test0)
    ws = wb.worksheets[0]
    ws['A27'] = 'Errors:  ' + count_red()
    wb.save(test0)

write_errors()

1 个答案:

答案 0 :(得分:0)

<强> Q1:

numpy.nan不会比较平等

a = np.nan
a == a
Out[61]: False

如果效果不佳,则有方法isnullnotnull

考虑到这一点,你的最后一次尝试非常接近:

np.where((dataframe1 != dataframe2) & (dataframe2.notnull()))

应该做你想要的。

<强> Q2: 可以使用.upper访问器从.lower对象(而不是pd.Series对象)访问pd.DataFrame.str等字符串方法。例如。 df[df.columns[0]].str.upper()将在df的第一列中返回一系列值,但全部为大写。