我在目录中有很多excel文件,它们都有相同的标题行。其中一些excel文件具有多个工作表,这些工作表又具有相同的标题。我正在尝试遍历目录中的excel文件,并为每一个检查是否有多个工作表来连接它们以及其余的excel文件。
这是我尝试过的:
import pandas as pd
import os
import ntpath
import glob
dir_path = os.path.dirname(os.path.realpath(__file__))
os.chdir(dir_path)
for excel_names in glob.glob('*.xlsx'):
# read them in
i=0
df = pd.read_excel(excel_names[i], sheet_name=None, ignore_index=True)
cdf = pd.concat(df.values())
cdf.to_excel("c.xlsx", header=False, index=False)
excels = [pd.ExcelFile(name) for name in excel_names]
# turn them into dataframes
frames = [x.parse(x.sheet_names[0], header=None,index_col=None) for x in excels]
# delete the first row for all frames except the first
# i.e. remove the header row -- assumes it's the first
frames[1:] = [df[1:] for df in frames[1:]]
# concatenate them..
combined = pd.concat(frames)
# write it out
combined.to_excel("c.xlsx", header=False, index=False)
i+=1
但是随后我得到以下错误任何建议吗?
"concat excel.py", line 12, in <module>
df = pd.read_excel(excel_names[i], sheet_name=None, ignore_index=True)
File "/usr/local/lib/python2.7/site-packages/pandas/util/_decorators.py", line 188, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/pandas/util/_decorators.py", line 188, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/pandas/io/excel.py", line 350, in read_excel
io = ExcelFile(io, engine=engine)
File "/usr/local/lib/python2.7/site-packages/pandas/io/excel.py", line 653, in __init__
self._reader = self._engines[engine](self._io)
File "/usr/local/lib/python2.7/site-packages/pandas/io/excel.py", line 424, in __init__
self.book = xlrd.open_workbook(filepath_or_buffer)
File "/usr/local/lib/python2.7/site-packages/xlrd/__init__.py", line 111, in open_workbook
with open(filename, "rb") as f:
IOError: [Errno 2] No such file or directory: 'G'
答案 0 :(得分:1)
您的for
语句将excel_names
依次设置为每个文件名(因此,更好的变量名为excel_name
):
for excel_names in glob.glob('*.xlsx'):
但是在循环内您的代码确实如此
df = pd.read_excel(excel_names[i], sheet_name=None, ignore_index=True)
您显然希望excel_names
是要从中提取一个元素的列表。但这不是一个列表,而是一个字符串。因此,您将获得第一个文件名的第一个字符。