我在一个文件夹中有一组excel表,在文件夹2中有另一组。如果两个文件夹中的相同文件名匹配,我需要在单元格中取差异。
第I列的差异在下面有一个脚本。如何传递for循环?
import pandas as pd
df1 = pd.read_excel('firstfolder/0.xls')
df2 = pd.read_excel('secondfolder/0.xls')
difference = df1[df1!=df2]
print (difference)
答案 0 :(得分:2)
假设您的文件夹包含相同的xls文件,并且文件都具有相同的结构,那么您可以使用glob
并迭代 -
import glob
diffs = []
for i, j in zip(*map(glob.glob, ['firstfolder/*.xls', 'secondfolder/*.xls'])):
i, j = map(pd.read_excel, [i, j])
diffs.append(i[i != j])
diff = pd.concat(diffs, axis=0)
答案 1 :(得分:0)
from itertools import zip_longest
import xlrd
import os
first_files = os.listdir('folder1')
second_files = os.listdir('folder2')
matches = [x for x in second_files if x in first_files]
#print(matches)
for (matches[0]) in matches:
print (matches[0])
rb1 = xlrd.open_workbook(os.path.join('folder1',matches[0]))
rb2 = xlrd.open_workbook(os.path.join('folder2',matches[0]))
sheet1 = rb1.sheet_by_index(0)
sheet2 = rb2.sheet_by_index(0)
for rownum in range(max(sheet1.nrows, sheet2.nrows)):
if rownum < sheet1.nrows:
row_rb1 = sheet1.row_values(rownum)
row_rb2 = sheet2.row_values(rownum)
for colnum, (c1, c2) in enumerate(zip_longest(row_rb1, row_rb2)):
if c1 != c2:
print ("Row {} Col {} - {} != {}".format(rownum+1, colnum+1, c1, c2))
else:
print ("Row {} missing".format(rownum+1))