此脚本比较两个csv文件...有两列plz帮助我修改此脚本,如果sample1.csv和sample2.csv有超过2列或1列。
f1_in = open("sample1.csv","r")
next(f1_in,None)
f1_dict = {}
for line in f1_in:
l = line.split(',')
f1_dict[l[0]. strip()] = l[1]. strip()
l.sort()
f1_in.close()
f2_in = open("sample2.csv","r")
next(f2_in,None)
f2_dict = {}
for line in f2_in:
l = line.split(',')
f2_dict[l[0]. strip()] = l[1]. strip()
l.sort()
f2_in.close()
f_same = open("same.txt","w")
f_different = open("different.txt","w")
for k1 in f1_dict.keys():
if k1 in f2_dict.keys() \
and f2_dict[k1] == f1_dict[k1]:
f_same.write("{0}, {1}\n". format(str(k1)+" "+str(f1_dict[k1]),
str(k1)+" "+str(f2_dict[k1])))
elif not k1 in f2_dict.keys():
f_different.write("{0}, {1}\n". format(str(k1)+" "+str(f1_dict[k1]),
"------"))
elif not f2_dict[k1] == f1_dict[k1]:
f_different.write("{0}, {1}\n". format(str(k1)+" "+str(f1_dict[k1]),
str(k1)+" "+str(f2_dict[k1])))
f_same.close()
f_different.close()
例如:如果我的源文件具有名称和工资作为标题,其值为A 20000 B 15000 C 10000 D 10000,目标文件也包含名称和工资,其标题的值为A 40000 D 10000 B 15000 C 10000 E 8000 .. .my输出应该是不同的行:A 20000 A 40000 D 10000 -----(目标中没有文件)-----(源文件中没有文件)E 8000和公共行为B 15000 B 15000,C 10000 C 10000
答案 0 :(得分:0)
如果您将列视为字典中的键/值对,难怪您无法将代码扩展为两个以上的列。
你必须将它们视为一组"中的元素。我理解这就是你没有使用csv
模块或difflib
模块的原因:因为你不关心这两行中是否出现(几乎)相同的顺序,但它们是否完全出现
以下是一个例子:
import itertools
def compare(first_filename, second_filename):
lines1 = set()
lines2 = set()
with open(first_filename, 'r') as file1, \
open(second_filename, 'r') as file2:
for line1, line2 in itertools.izip_longest(file1, file2):
if line1:
lines1.add(line1)
if line2:
lines2.add(line2)
print "Different lines"
for line in lines1 ^ lines2:
print line,
print "---"
print "Common lines"
for line in lines1 & lines2:
print line,
请注意,此代码将找到两个文件的差异,而不仅仅是f1上存在的内容,而不是f2上的内容,正如您的示例所做的那样。但是,它无法分辨出差异的来源(因为这似乎不是问题的要求)。
In [40]: !cat sample1.csv
bacon, eggs, mortar
whatever, however, whenever
spam, spam, spam
In [41]: !cat sample2.csv
guido, van, rossum
spam, spam, spam
In [42]: compare("sample1.csv", "sample2.csv")
Different lines
whatever, however, whenever
guido, van, rossum
bacon, eggs, mortar
---
Common lines
spam, spam, spam