如何在2个不同的文件夹中比较2个不同的excel文件?

时间:2018-04-05 07:22:03

标签: python pandas

我在一个文件夹中有一组excel表,在文件夹2中有另一组。如果两个文件夹中的相同文件名匹配,我需要在单元格中取差异。

第I列的差异在下面有一个脚本。如何传递for循环?

import pandas as pd
df1 = pd.read_excel('firstfolder/0.xls')
df2 = pd.read_excel('secondfolder/0.xls')
difference = df1[df1!=df2]
print (difference)

2 个答案:

答案 0 :(得分:2)

假设您的文件夹包含相同的xls文件,并且文件都具有相同的结构,那么您可以使用glob并迭代 -

import glob

diffs = []
for i, j in zip(*map(glob.glob, ['firstfolder/*.xls', 'secondfolder/*.xls'])):
    i, j = map(pd.read_excel, [i, j])
    diffs.append(i[i != j])

diff = pd.concat(diffs, axis=0)

答案 1 :(得分:0)

礼貌:Compare 2 excel files using Python

from itertools import zip_longest
import xlrd
import os

first_files = os.listdir('folder1')
second_files = os.listdir('folder2')
matches = [x for x in second_files if x in first_files]
#print(matches)

for (matches[0]) in matches:
    print (matches[0])
    rb1 = xlrd.open_workbook(os.path.join('folder1',matches[0]))
    rb2 = xlrd.open_workbook(os.path.join('folder2',matches[0]))
    sheet1 = rb1.sheet_by_index(0)
    sheet2 = rb2.sheet_by_index(0)

    for rownum in range(max(sheet1.nrows, sheet2.nrows)):
        if rownum < sheet1.nrows:
            row_rb1 = sheet1.row_values(rownum)
            row_rb2 = sheet2.row_values(rownum)

            for colnum, (c1, c2) in enumerate(zip_longest(row_rb1, row_rb2)):
                if c1 != c2:
                    print ("Row {} Col {} - {} != {}".format(rownum+1, colnum+1, c1, c2))

        else:
            print ("Row {} missing".format(rownum+1))