Question

我的项目是处理不同的Excel文件。为此，我想创建一个包含以前文件的一些数据的文件。这一切都是为了拥有我的数据库。目标是获得这些数据的图表。所有这些都是自动的。

我用Python编写了这个程序。但是，运行它需要20分钟。我该如何优化它？另外，我在一些文件中有相同的变量。所以我希望在最终文件中，相同的变量不会重复。怎么办？

这是我的计划：

import os
import xlrd
import xlsxwriter
from xlrd import open_workbook

wc = xlrd.open_workbook("U:\\INSEE\\table-appartenance-geo-communes-16.xls")
sheet0=wc.sheet_by_index(0)

# création 

with xlsxwriter.Workbook('U:\\INSEE\\Department61.xlsx') as bdd:
    dept61 = bdd.add_worksheet('deprt61')

folder_path = "U:\\INSEE\\2013_telechargement2016"

col=8
constante3=0
lastCol=0
listeV = list()

for path, dirs, files in os.walk(folder_path):   
    for filename in files:            
        filename = os.path.join(path, filename)        
        wb = xlrd.open_workbook(filename, '.xls')            
        sheet1 = wb.sheet_by_index(0)        
        lastRow=sheet1.nrows          
        lastCol=sheet1.ncols           
        colDep=None
        firstRow=None
        for ligne in range(0,lastRow):                  
            for col2 in range(0,lastCol):                     
                if sheet1.cell_value(ligne, col2) == 'DEP':
                    colDep=col2
                    firstRow=ligne
                    break
            if colDep is not None:
                break
        col=col-colDep-2-constante3
        constante3=0
        for nCol in range(colDep+2,lastCol):
                    constante=1
                    for ligne in range(firstRow,lastRow):
                            if sheet1.cell(ligne, colDep).value=='61':
                                    Q=(sheet1.cell(firstRow, nCol).value in listeV)
                                    if Q==False:
                                            V=sheet1.cell(firstRow, nCol).value
                                            listeV.append(V)
                                            dept61.write(0,col+nCol,sheet1.cell(firstRow, nCol).value)
                                            for ligne in range(ligne,lastRow):
                                                    if sheet1.cell(ligne, colDep).value=='61':
                                                            dept61.write(constante,col+nCol,sheet1.cell(ligne, nCol).value)
                                                    constante=constante+1

                                    elif Q==True:
                                            constante3=constante3+1 # I have a problem here. I would like to count the number of variables that already exists but I find huge numbers.
                    break
        col=col+lastCol   

bdd.close()

感谢您将来的帮助。：）

Answer 1

这个可能过于宽泛，所以这里有一些指针，你可以优化。也许添加一张表格的样本截图。

Wrt if sheet1.cell_value(ligne, col2) == 'DEP':可以＆＃39; DEP＆＃39;在工作表中多次出现？如果肯定只发生一次，那么当您获得colDep和firstRow的值时，就会突破两个循环。从两个循环中添加break，通过添加一个中断来结束内部循环，然后检查一个标志值并在迭代之前突破外部循环。像这样：

colDep = None # initialise to None firstRow = None # initialise to None for ligne in range(0,lastRow): for col2 in range(0,lastCol): if sheet1.cell_value(ligne, col2) == 'DEP': colDep=col2 firstRow=ligne break # break out of the `col2 in range(0,lastCol)` loop if colDep is not None: # or just `if colDep:` if colDep will never be 0. break # break out of the `ligne in range(0,lastRow)` loop

我认为写入-bdd块中for ligne in range(0,lastRow):的范围应该从firstRow开始，因为您知道sheet1中0到firstRow-1将为空您刚刚阅读以查找标题。

for ligne in range(firstRow, lastRow):

这样可以避免浪费时间阅读空标题行。

清洁代码的其他注意事项：

为清晰起见，请使用with xlsxwriter.Workbook('U:\INSEE\\Department61.xlsx') as bdd: syntax。

并且即使不在控制字符之前，也总是在字符串中使用双斜杠\\：'U:\\INSEE\\Department61.xlsx'

您已使用sheet1.cell_value()以及sheet1.cell().value进行阅读操作。选择一个，除非您需要value=='61'案例中的扩展单元格信息。

阅读PEP-8，了解如何编写更易读的代码。

从Python创建Excel文件

1 个答案: