我有两个这样的文件:
file 1 : file 2 :
col1 col2 col1 col2
john kerry john kerry
adam lord bob abram
joe hitch
我想基于lastnames和firstnames比较这两个文件,只得到一个不包含文件2中人员的文件,也就是说:
desired output file :
col1 col2
adam lord
joe hitch
我尝试了这个,但我没有得到正确的输出
import csv
reader1=csv.reader(open('file1.csv', 'r'), delimiter='\t')
reader2=csv.reader(open('file2.csv', 'r'), delimiter='\t')
writer=csv.writer(open('desired_file.csv', 'w'), delimiter=',')
row1 = reader1.next()
row2 = reader2.next()
if (row1[0] == row2[0]) and (row1[1] == row2[1]):
print 'equal'
else:
writer.writerow(row1)
writer.writerow(row2)
答案 0 :(得分:2)
我会使用一组差异:
with open('file1') as f1, open('file2') as f2:
data1 = set(f1)
lines_not_in_f2 = data1.difference(f2)
如果文件的格式可能略有不同,您可能需要将文件对象包装在生成元组的生成器中:
def people(my_file):
for line in myfile:
yield tuple(x.lower() for x in line.split())
with open('file1') as f1, open('file2') as f2:
data1 = set(people(f1))
people_not_in_f2 = data1.difference(people(f2))
这样做的好处是您不需要将整个f2文件读入内存。它的缺点是输出名称是无序的(因为它们存储在一个集合中)。
答案 1 :(得分:0)
如果文件格式相同,我认为您不需要csv
模块。
这个解决方案怎么样:
exclude_names = frozenset(open('file2')) # make set for performance
with open('output', 'w') as f:
for name in open('file1'):
if name not in exclude_names:
f.write(name)
csv
读者/作者的解决方案:
import csv
exclude_names = frozenset(csv.reader(open('file2.csv', 'r'), delimiter='\t'))
with open('desired_file.csv', 'w') as f:
writer = csv.writer(f, delimiter=',')
for row in csv.reader(open('file1', 'r'), delimiter='\t'):
if row not in exclude_names:
writer.writerow(row)
答案 2 :(得分:0)
results=[i for i, j in zip(reader1, reader2) if i != j]
如果订单不重要,请使用set(reader1) - set(reader2)
。
myfile = open(..., 'wb')
wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
wr.writerow(results)