我正在尝试比较python中的两个csv文件,并输出差异以及每一列的标题。到目前为止,根据我的操作,它会输出所有列,而不仅仅是输出差异
import csv
with open('firstfile.csv', 'r') as f1:
file1 = f1.readlines()
with open('secondfile.csv', 'r') as f2:
file2 = f2.readlines()
with open('results.csv', 'w') as outFile:
outFile.write(file1[0])
for line in file2:
if line not in file1:
outFile.write(line)
答案 0 :(得分:1)
我认为这段代码可以解决您的问题
import sys
with open('file1.csv', 'r') as f1:
file1 = f1.readlines()
with open('file2.csv', 'r') as f2:
file2 = f2.readlines()
delimiter = '\t' # Column delimiter in you file
headers_of_first_file = file1[0].strip().split(delimiter)
headers_of_second_file = file2[0].strip().split(delimiter)
# You can remove this assert if you want to work files with different columns then you have to add some more code in next blocks
different_headers = set(headers_of_first_file).symmetric_difference(headers_of_second_file)
if different_headers:
print('Files have difference in headers: ', different_headers)
sys.exit(-1)
# Build map {header: [all_values]}
first_file_map = {header: [] for header in headers_of_first_file}
for row in file1[1:]:
for index, cell in enumerate(row.strip().split(delimiter)):
first_file_map[headers_of_first_file[index]].append(cell)
# Check by built map. Dont forget that columns may change order
result = set()
for row in file2[1:]:
for index, cell in enumerate(row.strip().split(delimiter)):
if cell not in first_file_map[headers_of_second_file[index]]:
result.add(headers_of_second_file[index])
with open('results.csv', 'w') as out_file:
out_file.write('\t'.join(result))
UPD 文件示例:
Column1 Column2 Column3 Column5 Column4
1 2 3 5 4
10 20 30 50 40
Column1 Column2 Column3 Column4 Column5
11 2 3 4 5
10 10 30 40 50
'\ t'是分隔符