Question

我正在尝试比较以下数据：

|text_col|corr_acc|
+--------+--------+
|Car123  |xxx1    |
|Car234  |xxx2    |
|Car123  |xxx1    |
|Car456  |xxx3    |
|Car234  |xxx2    |
|Car123  |xxx5    |

如果在其他任何行（例如第3行）中都可以找到第一行（text_col）中的Car123，则必须比较corr_acc。
如果每行的corr_acc相同，则必须将其写入名为match的新列表中。
否则，必须将corr_acc的两个值都添加到名为no_match的列表中，并与原始值相符。

no_match列表的最终结果如下：

|text_col |corr_acc|Result   |
+---------+--------+---------+
|Car123   |xxx1    |xxx1,xxx5|
|Car234   |xxx2    |         |
|Car123   |xxx1    |xxx1,xxx5|
|Car456   |xxx3    |         |
|Car234   |xxx2    |         |
|Car123   |xxx5    |xxx1,xxx5|

我有以下适用于我的代码，但是太慢了（需要比较20万行）：

wb = openpyxl.load_workbook('D:\\peter\\Book3.xlsx')
sheet = wb['2018']
i = 0
j = 0
list_match = []
list_no_match = []

for i in range(2,(len(sheet['A']))+1):
    text_col_c_1 = str((sheet.cell(row=i, column=3).value))
    corr_acc_1 = str((sheet.cell(row=i, column=11).value))
    for j in range(2+i,(len(sheet['A']))+1):
        text_col_c_2 = str((sheet.cell(row=j, column=3).value))
        corr_acc_2 = str((sheet.cell(row=j, column=11).value))
        if text_col_c_1 is text_col_c_2:
            if corr_acc_1 is corr_acc_2:

                list_match.append(text_col_c_1+","+corr_acc_1+","+corr_acc_2+"\n")
            else:
               list_no_match.append(text_col_c_1+","+corr_acc_1+","+corr_acc_2+"\n")
        else:
            #list_no_match.append(text_col_c_1+","+corr_acc_1+"\n")
            pass


F = open("d:\\peter\\match_list.txt", "w")
for each in  list_match:
   F.write(each)
F.close()

F = open("d:\\peter\\no_match_list.txt", "w")
for each in  list_no_match:
   F.write(each)
F.close()

如何提高代码速度？

使用Python比较Excel文档中200,000行中的每一列中的两列

0 个答案: