Question

我正在尝试比较python中的两个csv文件，并输出差异以及每一列的标题。到目前为止，根据我的操作，它会输出所有列，而不仅仅是输出差异

import csv

with open('firstfile.csv', 'r') as f1:
    file1 = f1.readlines()

with open('secondfile.csv', 'r') as f2:
    file2 = f2.readlines()

with open('results.csv', 'w') as outFile:
    outFile.write(file1[0])
    for line in file2:
        if line not in file1:
            outFile.write(line)

Answer 1

我认为这段代码可以解决您的问题

import sys

with open('file1.csv', 'r') as f1:
    file1 = f1.readlines()

with open('file2.csv', 'r') as f2:
    file2 = f2.readlines()

delimiter = '\t'  # Column delimiter in you file
headers_of_first_file = file1[0].strip().split(delimiter)
headers_of_second_file = file2[0].strip().split(delimiter)

# You can remove this assert if you want to work files with different columns then you have to add some more code in next blocks
different_headers = set(headers_of_first_file).symmetric_difference(headers_of_second_file)
if different_headers:
    print('Files have difference in headers: ', different_headers)
    sys.exit(-1)

# Build map {header: [all_values]}
first_file_map = {header: [] for header in headers_of_first_file}
for row in file1[1:]:
    for index, cell in enumerate(row.strip().split(delimiter)):
        first_file_map[headers_of_first_file[index]].append(cell)

# Check by built map. Dont forget that columns may change order
result = set()
for row in file2[1:]:
    for index, cell in enumerate(row.strip().split(delimiter)):
        if cell not in first_file_map[headers_of_second_file[index]]:
            result.add(headers_of_second_file[index])

with open('results.csv', 'w') as out_file:
    out_file.write('\t'.join(result))

UPD 文件示例：

Column1 Column2 Column3 Column5 Column4
1   2   3   5   4
10  20  30  50  40

Column1 Column2 Column3 Column4 Column5
11  2   3   4   5
10  10  30  40  50

'\ t'是分隔符

比较python中的两个csv文件并保留更改的标头

1 个答案: