我很难比较两个CSV文件并打印出单独的报告。我希望我的脚本首先匹配两个文件上的ID,然后比较其余的行,并打印出单独的报告以显示差异。我拥有的脚本会比较两个文件并打印出差异,但是如果新文件中有其他行,该脚本将无法工作。
两个文件的示例:
旧文件
ID fname lname status
1 joe pol active
2 peters dol active
3 john nol active
4 mike sol active
新文件
ID fname lname status
1 joe pol active
2 peter dol active
67 ryan olson stop
3 johnny nolly stop
4 mike sol active
代码:
import csv
orig = open('OLD.csv','r')
new = open('NEW.csv','r')
Change = set(new) - set(orig)
print(Change)
with open('OLD.csv', mode='r') as infile:
reader = csv.reader(infile)
with open('different.csv', 'w') as file_out:
for line in Change:
file_out.write(line)
orig.close()
new.close()
file_out.close()
答案 0 :(得分:0)
由于CSV文件需要逗号分隔,因此我假设您的文件可以采用以下格式:
old.csv:
ID,fname,lname,status
1,joe,pol,active
2,peters,dol,active
3,john,nol,active
4,mike,sol,active
new.csv:
ID,fname,lname,status
1,joe,pol,active
2,peter,dol,active
67,ryan,olson,stop
3,johnny,nolly,stop
4,mike,sol,active
然后您可以使用以下代码将它们转换为报告:
from csv import reader
# Creates a row dictionary from file
def get_row_map(filename):
row_map = {}
with open(filename) as file:
csv_reader = reader(file)
_, *headers = next(csv_reader)
# map ids to rows
for row in csv_reader:
idx, *rest = row
row_map[int(idx)] = dict(zip(headers, rest))
return row_map
old_row_map = get_row_map("old.csv")
new_row_map = get_row_map("new.csv")
with open("different.txt", "w") as out:
# Only loop over matched ids
for row_id in old_row_map.keys() & new_row_map.keys():
# only proceed if rows are not exactly the same
if old_row_map[row_id] != new_row_map[row_id]:
# convert to sets
old_set, new_set = (
set(old_row_map[row_id].items()),
set(new_row_map[row_id].items()),
)
# get differences between old and new sets
old_diff = dict(list(old_set - new_set))
new_diff = dict(list(new_set - old_set))
# write out report
out.write("ID: %d\n" % row_id)
for key in old_diff:
out.write(
"%s -> old: %s, new: %s\n" % (key, old_diff[key], new_diff[key])
)
输出以下 difference.txt:
ID: 2
fname -> old: peters, new: peter
ID: 3
fname -> old: john, new: johnny
lname -> old: nol, new: nolly
status -> old: active, new: stop