比较2个CSV文件

时间:2018-11-01 02:19:16

标签: python python-3.x

我很难比较两个CSV文件并打印出单独的报告。我希望我的脚本首先匹配两个文件上的ID,然后比较其余的行,并打印出单独的报告以显示差异。我拥有的脚本会比较两个文件并打印出差异,但是如果新文件中有其他行,该脚本将无法工作。

两个文件的示例:

旧文件

ID  fname   lname   status
1   joe pol active
2   peters  dol active
3   john    nol active
4   mike    sol active

新文件

ID  fname   lname   status
1   joe pol active
2   peter   dol active
67  ryan    olson   stop
3   johnny  nolly   stop 
4   mike    sol active

代码:

import csv

orig = open('OLD.csv','r')
new = open('NEW.csv','r')

Change = set(new) - set(orig)

print(Change)

with open('OLD.csv', mode='r') as infile:
    reader = csv.reader(infile)
    with open('different.csv', 'w') as file_out:
        for line in Change:
            file_out.write(line)

orig.close()
new.close()
file_out.close()

1 个答案:

答案 0 :(得分:0)

由于CSV文件需要逗号分隔,因此我假设您的文件可以采用以下格式:

old.csv:

ID,fname,lname,status
1,joe,pol,active
2,peters,dol,active
3,john,nol,active
4,mike,sol,active

new.csv:

ID,fname,lname,status
1,joe,pol,active
2,peter,dol,active
67,ryan,olson,stop
3,johnny,nolly,stop
4,mike,sol,active

然后您可以使用以下代码将它们转换为报告:

from csv import reader


# Creates a row dictionary from file
def get_row_map(filename):
    row_map = {}

    with open(filename) as file:
        csv_reader = reader(file)
        _, *headers = next(csv_reader)

        # map ids to rows
        for row in csv_reader:
            idx, *rest = row
            row_map[int(idx)] = dict(zip(headers, rest))

    return row_map


old_row_map = get_row_map("old.csv")
new_row_map = get_row_map("new.csv")

with open("different.txt", "w") as out:

    # Only loop over matched ids
    for row_id in old_row_map.keys() & new_row_map.keys():

        # only proceed if rows are not exactly the same
        if old_row_map[row_id] != new_row_map[row_id]:

            # convert to sets
            old_set, new_set = (
                set(old_row_map[row_id].items()),
                set(new_row_map[row_id].items()),
            )

            # get differences between old and new sets
            old_diff = dict(list(old_set - new_set))
            new_diff = dict(list(new_set - old_set))

            # write out report
            out.write("ID: %d\n" % row_id)
            for key in old_diff:
                out.write(
                    "%s -> old: %s, new: %s\n" % (key, old_diff[key], new_diff[key])
                )

输出以下 difference.txt:

ID: 2
fname -> old: peters, new: peter
ID: 3
fname -> old: john, new: johnny
lname -> old: nol, new: nolly
status -> old: active, new: stop