我有一个包含多个子文件夹的文件夹,我想浏览所有以xlsx结尾的excel文件,并使用以下代码将它们合并为一个xlsx文件:
import os
import glob
for root, dirs, files in os.walk("D:/Test"):
for file in files:
if file.endswith(".xlsx"):
#print(os.path.join(root, file))
s = os.path.join(root, file)
print(s)
all_data = pd.DataFrame()
for f in s:
df = pd.read_excel(f)
all_data = all_data.append(df,ignore_index=True)
# now save the data frame
writer = pd.ExcelWriter('result.xlsx')
all_data.to_excel(writer,'sheet1')
writer.save()
运行时发生错误:
Traceback (most recent call last):
File "<ipython-input-169-41c6d76207e7>", line 12, in <module>
df = pd.read_excel(f)
File "C:\Users\User\Anaconda3\lib\site-packages\pandas\util\_decorators.py", line 118, in wrapper
return func(*args, **kwargs)
File "C:\Users\User\Anaconda3\lib\site-packages\pandas\io\excel.py", line 230, in read_excel
io = ExcelFile(io, engine=engine)
File "C:\Users\User\Anaconda3\lib\site-packages\pandas\io\excel.py", line 294, in __init__
self.book = xlrd.open_workbook(self._io)
File "C:\Users\User\Anaconda3\lib\site-packages\xlrd\__init__.py", line 116, in open_workbook
with open(filename, "rb") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'D'
有人知道如何处理这个问题吗?谢谢。
答案 0 :(得分:1)
您的问题出在df = pd.read_excel(f)
上。 f
的内容是什么?看起来Python认为它是'D'。
这是因为您的for f in s:
仅遍历您使用s = os.path.join(root, file)
创建的字符串。我想您想将其保存在这样的容器中
paths = []
for root, dirs, files in os.walk("D:/Test"):
for file in files:
if file.endswith(".xlsx"):
#print(os.path.join(root, file))
s = os.path.join(root, file)
print(s)
paths.append(s)
all_data = pd.DataFrame()
for f in paths:
df = pd.read_excel(f)
all_data = all_data.append(df,ignore_index=True)
您还可以通过以下方式将最初的for
循环简化为列表理解
paths = [os.path.join(root, file) for root, _, files in os.walk('D:/Test') for file in files if file.endswith('.xlsx')]
答案 1 :(得分:1)
import os
import pandas as pd
listof_files = os.listdir()
current_file_name = os.path.basename(__file__)
#flag to make sure append is happening properly
count = 0
mainFrame = 0
for file in listof_files:
#To ignore the python script file for pd.read_excel
if((file != current_file_name) and (file.endswith(".xlsx"))):
tempdf = pd.read_excel(str(file))
if(count == 0):
mainFrame = tempdf.copy()
else:
mainFrame = pd.concat([mainFrame,tempdf])
count += 1
mainFrame.to_excel('final.xlsx',index=False)
您也可以这样做,将脚本放入所有xlsx文件所在的文件夹中,然后运行脚本,它将获取所有xlsx文件并相互连接,最后形成一个excel文件。