我有两个我想要比较的CSV文件。我用dict阅读器读过它们。所以现在我有两个CSV文件的字典(每行一个)。我想比较它们,比如当两个元素(标题为h1和h2的元素)相同时,比较这些字典并打印出与第二个字典相关的差异。以下是示例csv文件。
csv1:
h1,h2,h3
aaa,g0,74
bjg,73,kg9
CSV_new:
h1,h2,h3,h4
aaa,g0,7,
bjg,73,kg9,ahf
我希望输出是这样的,尽管不完全如下所示,我希望它能够打印出与CSV_new相关的每个字典中的修改,添加和删除:
{h1:'aaa', h2:'g0' {h3:'74', h4:''}}
{h1:'bjg', h2:'73' {h4:''}
我的代码,还不够发达。
import csv
f1 = "csv1.csv"
reader1 = csv.DictReader(open (f1), delimiter = ",")
for row1 in reader1:
row1['h1']
#['%s:%s' % (f, row[f]) for f in reader.fieldnames]
f2 = "CSV_new.csv"
reader2 = csv.DictReader(open (f2), delimiter = ",")
for row2 in reader2:
row2['h1']
if row1['h1'] == row2['h1']:
print row1, row2
答案 0 :(得分:1)
如果您只想找到差异,可以使用difflib
例如:
import difflib
fo1 = open(csv)
fo2 = open(CSV_new)
diff =difflib.ndiff(fo1.readlines(),fo2.readlines())
然后你可以根据需要写出差异
答案 1 :(得分:0)
这可能是您正在寻找的,但如上所述,您的描述中存在一些含糊之处。
with open(A) as fd1, open(B) as fd2:
a, b = csv.reader(fd1), csv.reader(fd2)
ha, hb = next(a), next(b)
if not set(ha).issubset(set(hb)):
sys.exit(1)
lookup = {label : (key, hb.index(label)) for key, label in enumerate(ha)}
for rowa, rowb in zip(a, b):
for key in lookup:
index_a, index_b = lookup[key]
if rowa[index_a] != rowb[index_b]:
print(rowb)
break