在Python中的文件夹及其子文件夹中浏览和合并Excel文件

时间:2018-07-17 05:50:12

标签: python excel pandas

我有一个包含多个子文件夹的文件夹,我想浏览所有以xlsx结尾的excel文件,并使用以下代码将它们合并为一个xlsx文件:

import os
import glob

for root, dirs, files in os.walk("D:/Test"):
    for file in files:
        if file.endswith(".xlsx"):
             #print(os.path.join(root, file))
             s = os.path.join(root, file)
             print(s)
all_data = pd.DataFrame()
for f in s:
    df = pd.read_excel(f)
    all_data = all_data.append(df,ignore_index=True)

# now save the data frame
writer = pd.ExcelWriter('result.xlsx')
all_data.to_excel(writer,'sheet1')
writer.save()

运行时发生错误:

Traceback (most recent call last):

  File "<ipython-input-169-41c6d76207e7>", line 12, in <module>
    df = pd.read_excel(f)

  File "C:\Users\User\Anaconda3\lib\site-packages\pandas\util\_decorators.py", line 118, in wrapper
    return func(*args, **kwargs)

  File "C:\Users\User\Anaconda3\lib\site-packages\pandas\io\excel.py", line 230, in read_excel
    io = ExcelFile(io, engine=engine)

  File "C:\Users\User\Anaconda3\lib\site-packages\pandas\io\excel.py", line 294, in __init__
    self.book = xlrd.open_workbook(self._io)

  File "C:\Users\User\Anaconda3\lib\site-packages\xlrd\__init__.py", line 116, in open_workbook
    with open(filename, "rb") as f:

FileNotFoundError: [Errno 2] No such file or directory: 'D'

有人知道如何处理这个问题吗?谢谢。

2 个答案:

答案 0 :(得分:1)

您的问题出在df = pd.read_excel(f)上。 f的内容是什么?看起来Python认为它是'D'。

这是因为您的for f in s:仅遍历您使用s = os.path.join(root, file)创建的字符串。我想您想将其保存在这样的容器中

paths = []
for root, dirs, files in os.walk("D:/Test"):
    for file in files:
        if file.endswith(".xlsx"):
             #print(os.path.join(root, file))
             s = os.path.join(root, file)
             print(s)
             paths.append(s)

all_data = pd.DataFrame()
for f in paths:
    df = pd.read_excel(f)
    all_data = all_data.append(df,ignore_index=True)

您还可以通过以下方式将最初的for循环简化为列表理解

paths = [os.path.join(root, file) for root, _, files in os.walk('D:/Test') for file in files if file.endswith('.xlsx')]

答案 1 :(得分:1)

import os
import pandas as pd

listof_files = os.listdir()
current_file_name = os.path.basename(__file__)

#flag to make sure append is happening properly
count = 0
mainFrame = 0

for file in listof_files:
    #To ignore the python script file for pd.read_excel
    if((file != current_file_name) and (file.endswith(".xlsx"))):

        tempdf = pd.read_excel(str(file))

        if(count == 0): 
            mainFrame = tempdf.copy()
        else: 
            mainFrame = pd.concat([mainFrame,tempdf])

        count += 1

mainFrame.to_excel('final.xlsx',index=False)  

您也可以这样做,将脚本放入所有xlsx文件所在的文件夹中,然后运行脚本,它将获取所有xlsx文件并相互连接,最后形成一个excel文件。