我有一个用于检查2个文本文件的脚本并打印出公共字段。但是,我觉得它不够快,而且我正在寻找优化。
FILE1(10k行,3列)和FILE2(200k行,2列),两个文件共有1个字段(csv文件)。
FILE1:
92073263d,86674404000555506123,通信
FILE2:
163738212,7a93632111w7-01e7-40e7-9387-1863e7683eca 63729jd83,07633221122c-6598-4489-B539-e42e2dcb3235 8djdy37w8,2b8retyre396-2472-4b2d-8d07-e170fa3d1f64 92073263d,07633221122c-6ew8-4eww-B539-e42dsadsadsa
with open('FILE1') as file1:
file1_contents = { tuple(line.split(',')) for line in file1 }
print file1_contents
with open('FILE2') as file2:
for line in file2:
c1,c2 = line.split()
if c1 in file1_contents:
f = open("FILE3","w")
f.write(c2)
f.close()
这行如果file1_contents中的c1给了我一个艰难的时间,因为我想避免任何嵌套循环来保持高速。有什么建议吗?
答案 0 :(得分:1)
再次感谢COLDSPEED ......还有我的新代码:
import pandas
data_comreport= pandas.read_csv('FILE1', sep = ',', header = 0)
data_db= pandas.read_csv('FILE2', sep = ',', header = None)
data_db.columns = ['SerialNumber', 'GUID']
data = pandas.merge(data_db,data_comreport,left_on = 'SerialNumber', right_on='SerialNumber', how='inner')
print data
#result = data.loc[data['FailureReason'] != ['Failure to export']]
#if result != None:
clean_data=data.to_csv('list.txt',index=False, columns=['GUID'],header = None)