我正在尝试比较以下数据:
|text_col|corr_acc|
+--------+--------+
|Car123 |xxx1 |
|Car234 |xxx2 |
|Car123 |xxx1 |
|Car456 |xxx3 |
|Car234 |xxx2 |
|Car123 |xxx5 |
text_col
)中的Car123
,则必须比较corr_acc
。 corr_acc
相同,则必须将其写入名为match的新列表中。corr_acc
的两个值都添加到名为no_match的列表中,并与原始值相符。no_match列表的最终结果如下:
|text_col |corr_acc|Result |
+---------+--------+---------+
|Car123 |xxx1 |xxx1,xxx5|
|Car234 |xxx2 | |
|Car123 |xxx1 |xxx1,xxx5|
|Car456 |xxx3 | |
|Car234 |xxx2 | |
|Car123 |xxx5 |xxx1,xxx5|
我有以下适用于我的代码,但是太慢了(需要比较20万行):
wb = openpyxl.load_workbook('D:\\peter\\Book3.xlsx')
sheet = wb['2018']
i = 0
j = 0
list_match = []
list_no_match = []
for i in range(2,(len(sheet['A']))+1):
text_col_c_1 = str((sheet.cell(row=i, column=3).value))
corr_acc_1 = str((sheet.cell(row=i, column=11).value))
for j in range(2+i,(len(sheet['A']))+1):
text_col_c_2 = str((sheet.cell(row=j, column=3).value))
corr_acc_2 = str((sheet.cell(row=j, column=11).value))
if text_col_c_1 is text_col_c_2:
if corr_acc_1 is corr_acc_2:
list_match.append(text_col_c_1+","+corr_acc_1+","+corr_acc_2+"\n")
else:
list_no_match.append(text_col_c_1+","+corr_acc_1+","+corr_acc_2+"\n")
else:
#list_no_match.append(text_col_c_1+","+corr_acc_1+"\n")
pass
F = open("d:\\peter\\match_list.txt", "w")
for each in list_match:
F.write(each)
F.close()
F = open("d:\\peter\\no_match_list.txt", "w")
for each in list_no_match:
F.write(each)
F.close()
如何提高代码速度?