我有一些将多个Excel工作簿组合在一起的代码。我需要做的是添加一个新列,并与每个记录关联的文件名。我记下了我认为自己会被追上的地方。
这是我到目前为止所拥有的:
import pandas as pd
import os
os.chdir('path')
# filenames
excel_names = [
"a.xlsx",
"b.xlsx"
]
# read them in
excels = [pd.ExcelFile(name) for name in excel_names]
# turn them into dataframes
frames = [x.parse(x.sheet_names[0], header=None,index_col=None) for x in excels]
# delete the first row for all frames except the first
# i.e. remove the header row -- assumes it's the first
frames[1:] = [df[1:] for df in frames[1:]]
# concatenate them..
combined = pd.concat(frames)
#i'm getting caught up here
combined = pd.concat([pd.read_excel(fp).assign(New=os.path.basename(fp)) for fp in excel_names])
# write it out
combined.to_excel("a and b combined.xlsx", header=False, index=False)
答案 0 :(得分:1)
尝试一下:
#[...]
# i.e. remove the header row -- assumes it's the first
frames[1:] = [df[1:] for df in frames[1:]]
####TO ADD
#add a filename column and put the excel name
for i in range(0,len(frames)):
frames[i]['filename']=excel_names[i]
##########
# concatenate them..
combined = pd.concat(frames)
#[...]