Question

我每月在Windows机器上的月度文件夹中存储一个文件，如下所示：

C:\customer\201811\cust_data_201811.xls
C:\customer\201812\cust_data_201812.xls
C:\customer\201901\cust_data_201901.xls
...

随着这一年的发展，它将保持增长。我需要编写一个python程序，该程序将遍历这些目录，读取每个文件，并不断追加到主文件中，该主文件将被写到单独的xls文件中。我该怎么办？

Answer 1

在不了解输出格式的更多信息的情况下，我可以帮助您阅读目录中的每个工作表。

import os
import pandas as pd

def parse_folder(folder_path):

    # Loop over files in path
    for f in os.listdir(folder_path):

        # Construct full path
        f_path = folder_path + "\\" + f + "\\cust_data_" + f + ".xlsx"

        # Read the workbook and store information into pandas dataframe
        # See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html for options
        wkbk = pd.read_excel(os.path.abspath(f_path), header=0)

        # // Do whatever needs to be done to the file here //


if __name__ == "__main__":
    folder_path = os.path.abspath("C:\\customer\\")
    parse_folder(folder_path)

此函数将循环遍历目录中的每个文件，并使用pandas read_excel（）读取所有文件。变量wkbk是一个熊猫数据框，可以对其进行解析以获取其中所需的任何信息。

对于从每个文件输出数据，如果要编译所有信息并将其输出到主Excel工作表，我建议您查看xlsxwriter。 xlsxwriter的局限性在于它不能追加到现有文件中，而只能写出全新的文件。解决方法是在当前主列表中读取该数据，然后将其重写到新文件中。

如何遍历目录并追加到主文件？

1 个答案: