我编写此代码来比较两个CSV files (f1 and f2)
,它们都有3列和多行,然后每次都是f1的item at cell 1 of f1 matches that of f2
和item at cell 2 of f1 matches that of f2, it should write the values
cell1,f1的cell2, f2,对于名为network_python.csv
代码:
t = {}
with open('file1.csv') as ff:
for f1 in csv.DictReader(ff):
with open('file2.csv') as ff:
for f2 in csv.DictReader(ff):
if int(f1['From'].strip()) == int(f2['From'].strip()) and int(f1['To'].strip()) == int(f2['To'].strip()):
print (f1['From'], f1['To'], f2['Mode'])
t.update({'From': f1['From'], 'To': f1['To'], 'Mode': f2['Mode']})
with open('network_python.csv', 'w') as csvfile:
fieldnames = ['From', 'To', 'Mode']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
for k,v in t.iteritems():
writer.writerow(t)
file1.csv中的示例数据
From To Mode
1 2 cw
2 1 cw
3 4 cwt
7 2 cbt
8 9 ct
file2.csv中的示例数据
From To Mode
8 9 c
3 4 cw
1 2 cwt
7 2 ct
2 1 cb
代码工作正常(即获得正确的输出),但在写入文件时,它会写入一行,从而覆盖以前的结果。还有一种方法可以提高代码的效率吗?因为大文件很慢。我在这里搜索了一些问题,但他们并没有完全回答我的问题谢谢你的时间
答案 0 :(得分:1)
不是100%肯定,但在你的情况下列表应该稍微更高效(迭代,内存和不查找)。但是,词典在查找值时非常有效。此外,您不需要转换为整数并在比较时使用条带。这段代码应该可以正常工作。
import csv
output = []
with open('file1.csv') as file1, open('file2.csv') as file2:
for f1 in csv.DictReader(file1, delimiter='\t'):
for f2 in csv.DictReader(file2, delimiter='\t'):
if f1['From'] == f2['From'] and f1['To'] == f2['To']:
new_item = [f1['From'], f1['To'], f2['Mode']]
print new_item
output.append(new_item)
with open('network_python.csv', 'w') as csvfile:
fieldnames = ['From', 'To', 'Mode']
writer = csv.writer(csvfile, delimiter=',')
writer.writerow(fieldnames)
for row in output:
writer.writerow(row)
答案 1 :(得分:1)
您可以从两个csv文件创建一个dict
,其中密钥为(from, to)
,并将它们合并为结果:
import csv
from collections import OrderedDict
with open('file1.csv') as f:
reader = csv.reader(f)
reader.next()
rows = OrderedDict((tuple(row[:2]), None) for row in reader)
with open('file2.csv') as f:
reader = csv.reader(f)
reader.next()
# Skip row if matching row wasn't present in file1.csv
rows.update({tuple(row[:2]): row[2] for row in reader if tuple(row[:2]) in rows})
with open('network_python.csv', 'wb') as csvfile:
fieldnames = ['From', 'To', 'Mode']
writer = csv.writer(csvfile)
writer.writerow(fieldnames)
# Skip row if it wasn't present in file2.csv
writer.writerows((k[0], k[1], v) for k, v in rows.iteritems() if v is not None)
答案 2 :(得分:0)
我在考虑将所有文件加载到内存而不是比较它们。
import csv
with open(file1Path,'rb') as f:
r = csv.reader(f)
res1 = [line for line in f]
with open(file2Path,'rb') as f:
r = csv.reader(f)
res2 = [line for line in f]
final = [ [file1col[0],file1col[1],file2col[2]]for file1col,file2col in zip(res1,res2) if file1col[0] == file2cole[0] and file1col[1] == file2col[1] ]
with open(finalPath,'wb') as f:
w = csv.writer(f)
w.writerow(['From','To','Mode'])
w.writerows(final)
答案 3 :(得分:-1)
尽量不要修改" t" dict
每次获得新行并使用生成器代替它
def get_row():
with open('file1.csv') as ff:
for f1 in csv.DictReader(ff):
with open('file2.csv') as ff:
for f2 in csv.DictReader(ff):
if int(f1['From'].strip()) == int(f2['From'].strip()) and int(f1['To'].strip()) == int(f2['To'].strip()):
print (f1['From'], f1['To'], f2['Mode'])
yield {'From': f1['From'], 'To': f1['To'], 'Mode': f2['Mode']}
with open('network_python.csv', 'w') as csvfile:
fieldnames = ['From', 'To', 'Mode']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
for k,v in t.iteritems():
writer.writerow(get_row())