使用Python比较Excel文档中200,000行中的每一列中的两列

时间:2018-08-08 06:30:45

标签: python excel

我正在尝试比较以下数据:

|text_col|corr_acc|
+--------+--------+
|Car123  |xxx1    |
|Car234  |xxx2    |
|Car123  |xxx1    |
|Car456  |xxx3    |
|Car234  |xxx2    |
|Car123  |xxx5    |
  • 如果在其他任何行(例如第3行)中都可以找到第一行(text_col)中的Car123,则必须比较corr_acc
  • 如果每行的corr_acc相同,则必须将其写入名为match的新列表中。
  • 否则,必须将corr_acc的两个值都添加到名为no_match的列表中,并与原始值相符。

no_match列表的最终结果如下:

|text_col |corr_acc|Result   |
+---------+--------+---------+
|Car123   |xxx1    |xxx1,xxx5|
|Car234   |xxx2    |         |
|Car123   |xxx1    |xxx1,xxx5|
|Car456   |xxx3    |         |
|Car234   |xxx2    |         |
|Car123   |xxx5    |xxx1,xxx5|

我有以下适用于我的代码,但是太慢了(需要比较20万行):

wb = openpyxl.load_workbook('D:\\peter\\Book3.xlsx')
sheet = wb['2018']
i = 0
j = 0
list_match = []
list_no_match = []

for i in range(2,(len(sheet['A']))+1):
    text_col_c_1 = str((sheet.cell(row=i, column=3).value))
    corr_acc_1 = str((sheet.cell(row=i, column=11).value))
    for j in range(2+i,(len(sheet['A']))+1):
        text_col_c_2 = str((sheet.cell(row=j, column=3).value))
        corr_acc_2 = str((sheet.cell(row=j, column=11).value))
        if text_col_c_1 is text_col_c_2:
            if corr_acc_1 is corr_acc_2:

                list_match.append(text_col_c_1+","+corr_acc_1+","+corr_acc_2+"\n")
            else:
               list_no_match.append(text_col_c_1+","+corr_acc_1+","+corr_acc_2+"\n")
        else:
            #list_no_match.append(text_col_c_1+","+corr_acc_1+"\n")
            pass


F = open("d:\\peter\\match_list.txt", "w")
for each in  list_match:
   F.write(each)
F.close()

F = open("d:\\peter\\no_match_list.txt", "w")
for each in  list_no_match:
   F.write(each)
F.close()

如何提高代码速度?

0 个答案:

没有答案