以下是样本数据:
1 ,ASIF JAVED IQBAL JAVED,JAVED IQBAL SO INAYATHULLAH,20170103
2 ,SYED MUSTZAR ALI MUHAMMAD ILYAS SHAH,MUHAMMAD SAFEER SO SAGHEER KHAN,20170127
3 ,AHSUN SABIR SABIR ALI,MISBAH NAVEED DO NAVEED ANJUM,20170116
4 ,RASHAD IQBAL PARVAIZ IQBAL,PERVAIZ IQBAL SO GUL HUSSAIN KHAN,20170104
5 ,RASHID ALI MUGHERI ABDUL RASOOL MUGHERI,MUMTAZ ALI BOHIO,20170105
6 ,FAKHAR IMAM AHMAD ALI,MOHAMMAD AKHLAQ ASHIQ HUSSAIN,20170105
7 ,AQEEL SARWAR MUHAMMAD SARWAR BUTT,BUSHRA WAHID,20170106
8 ,SHAFAQAT ALI REHMAT ALI,SAJIDA BIBI WO MUHAMMAD ASHRAF,20170106
9 ,MUHAMMAD ISMAIL SHAFQAT HUSSAIN,USAMA IQBAL,20170103
10 ,SULEMAN ALI KACHI GHULAM ALI,MUHAMMAD SHARIF ALLAH WARAYO,20170109
第一个是序列号,第二个是发送者,第三个是接收者,第四个是日期
并且这些数据持续了数百万行。
现在,我想找到哪个发件人在同一天发送包裹到同一个接收者。
我为此编写了以下基本代码,但速度非常慢。
import csv
from fuzzywuzzy import fuzz
serial = []
agency = []
rem_name = []
rem_name2 = []
date = []
with open('janCSV.csv') as f:
reader = csv.reader(f)
for row in reader:
serial.append(row[0])
rem_name.append(row[2])
rem_name2.append(row[2])
date.append(row[4])
with open('output.csv', 'w') as out:
for rem1 in rem_name:
date1 = date[rem_name.index(rem1)]
serial1 = serial[rem_name.index(rem1)]
for rem2 in rem_name2:
date2 = date[rem_name2.index(rem2)]
if date1 == date2:
ratio = fuzz.ratio(rem1, rem2)
if ratio >= 90 and ratio < 100:
print serial1, rem1, rem2, date1, date2, ratio
out.write(str(serial1) + ',' + str(date1) + ',' + str(date2) + ',' + str(rem1) + ',' + str(rem2) + ','
+ str(ratio) + '\n')