读取多个Excel文件并将其写入python中的多个Excel文件

时间:2019-03-01 05:26:52

标签: python excel pandas

我已经在读取excel文件的地方编写了代码,然后在处理了所需的功能之后,我想将其写入Excel文件。现在,我已经为一个Excel文件完成了此操作。现在我的问题是当我要为读取多个Excel文件的多个Excel文件执行此操作,然后输出也应位于多个Excel文件中时,我将如何在此处申请循环,以便为每个输入文件获得单独的输出Excel文件< / p>

以下是我的代码

from ParallelP import *
import time,json
import pandas as pd

if __name__ == '__main__':
     __ip__ = "ip/"
     __op__ = "op/"
     __excel_file_name__ = __ip__ + '80chars.xlsx'
     __prediction_op__ = __op__ + basename(__excel_file_name__) + "_processed.xlsx"
     df = pd.read_excel(__excel_file_name__)
     start_time = time.time()
     df_preprocessed = run(df)
     print("Time Needed to execute all data is {0} seconds".format((time.time() - start_time)))
     print("Done...")
     df_preprocessed.to_excel(__prediction_op__)

2 个答案:

答案 0 :(得分:0)

我确实写了一些代码。也许您可以根据自己的需求进行更改。

# This is where your input file should be
in_folder = 'input/xls/file/folder'

# This will be your output folder
out_folder = 'output/xls/file/folder'

if not os.path.exists(out_folder):
    os.makedirs(out_folder)


file_exist = False
dir_list = os.listdir(in_folder)
for xlfile in dir_list:
    if xlfile.endswith('.xlsx') or xlfile.endswith('.xls'):
        file_exist = True
        str_file = os.path.join(in_folder, xlfile)
        #work_book = load_workbook(filename=str_file)
        #work_sheet = work_book['qa']

        #Do ur work hear with excel

        #out_Path = os.path.join(out_folder,)

        #and output it to the out_Path

if not file_exist:
    print('cannot find any valid excel file in the folder ' + in_folder)

答案 1 :(得分:0)

我尝试坚持您的示例,并按照我的意愿进行扩展。下面的示例未经测试,并不意味着这是最好的方法!

from ParallelP import *
import time,json
import pandas as pd
import os
from pathlib import Path  # Handles directory paths -> less error prone than manually sticking together paths

if __name__ == '__main__':
    __ip__ = "ip/"
    __op__ = "op/"

    # Get a list of all excel files in a given directory
    excel_file_list = [f for f in os.listdir(__ip__) if f.endswith('.xlsx')]

    # Loop over the list and process each excel file seperately
    for excel_file in excel_file_list:
        excel_file_path = Path(__ip__, excel_file)  # Create the file path
        df = pd.read_excel(str(excel_file))  # Read the excel file to data frame
        start_time = time.time()
        df_preprocessed = run(df)  # Run your routine
        print("Time Needed to execute all data is {0} seconds".format((time.time() - start_time)))
        print("Done...")

        # Create the output file name
        prediction_output_file_name = '{}__processed.xlsx'.format(str(excel_file_path.resolve().stem))

        # Create the output file path
        prediction_output_file_path = str(Path(__op__, prediction_output_file_name))

        # Write the output to the excel file
        df_preprocessed.to_excel(prediction_output_file_path)

旁注:我不得不提到您的变量名感觉像是__的误用。这些'dunder'函数是特殊的,表示它们对python(see for example here)具有含义。请仅将变量input_diroutput_dir分别命名为__ip____op__