我正在编写一个python脚本来比较csv文件。但是它仅适用于逗号分隔,即使分隔符设置为\ t ...
d='\t'
for x in range(0, columns):
with open(mfile, 'rb') as master:
with open(cfile, 'rb') as check:
master_indices = dict((r[x], i) for i, r in enumerate(csv.reader(master, delimiter=d)))
check_reader = csv.reader(check, delimiter=d)
for row in check_reader:
index = master_indices.get(row[x])
if index is not None:
T += 1
matches += 1
else:
T += 1
编辑:
测试案例1 ......
M文件:
a,1
a,2
的CFile:
x,2
x,z
与d =','
读取两列并返回1匹配,T为4。
测试案例2 ......
M文件:
a 1
a 2
的CFile:
x 2
x z
与d =' \ t'
读取第1列返回0匹配,T为2。
修改:使用提供的,工作,并接受答案:
for x in range(0, columns):
with open(mfile, 'rb') as master:
dialect = csv.Sniffer().sniff(master.read(1024))
master.seek(0)
master_reader = csv.reader(master, dialect)
with open(cfile, 'rb') as check:
dialect = csv.Sniffer().sniff(check.read(1024))
check.seek(0)
check_reader = csv.reader(check, dialect)
master_indices = dict((r[x], i) for i, r in enumerate(master_reader))
for row in check_reader:
index = master_indices.get(row[x])
if index is not None:
T += 1
matches += 1
else:
T += 1
答案 0 :(得分:1)
您可以使用csv.Sniffer获取csv文件的方言:
with open(mfile, 'rb') as master:
dialect = csv.Sniffer().sniff(master.read(1024))
master.seek(0)
master_reader = csv.reader(master, dialect)
with open(cfile, 'rb') as check:
dialect = csv.Sniffer().sniff(check.read(1024))
check.seek(0)
check_reader = csv.reader(check, dialect)