检查2列脚本以快速运行

时间:2017-08-14 02:58:21

标签: python loops split tuples

我有一个用于检查2个文本文件的脚本并打印出公共字段。但是,我觉得它不够快,而且我正在寻找优化。

FILE1(10k行,3列)和FILE2(200k行,2列),两个文件共有1个字段(csv文件)。

  

FILE1:

     

92073263d,86674404000555506123,通信

     

FILE2:

     

163738212,7a93632111w7-01e7-40e7-9387-1863e7683eca   63729jd83,07633221122c-6598-4489-B539-​​e42e2dcb3235   8djdy37w8,2b8retyre396-2472-4b2d-8d07-e170fa3d1f64   92073263d,07633221122c-6ew8-4eww-B539-​​e42dsadsadsa

with  open('FILE1') as file1:
    file1_contents = { tuple(line.split(',')) for line in file1 }
    print file1_contents

with open('FILE2') as file2:
    for line in file2:
        c1,c2 = line.split()
     if c1 in  file1_contents:
            f = open("FILE3","w")
            f.write(c2)
            f.close()

这行如果file1_contents中的c1给了我一个艰难的时间,因为我想避免任何嵌套循环来保持高速。有什么建议吗?

1 个答案:

答案 0 :(得分:1)

再次感谢COLDSPEED ......还有我的新代码:

import pandas

data_comreport= pandas.read_csv('FILE1', sep = ',', header = 0)  
data_db= pandas.read_csv('FILE2', sep = ',', header = None)
data_db.columns = ['SerialNumber', 'GUID']
data = pandas.merge(data_db,data_comreport,left_on = 'SerialNumber', right_on='SerialNumber', how='inner')
print data
#result = data.loc[data['FailureReason'] != ['Failure to export']]
#if result != None:
clean_data=data.to_csv('list.txt',index=False, columns=['GUID'],header = None)