我有很多excel文件,我想使用以下代码附加多个excel文件:
import pandas as pd
import glob
import os
import openpyxl
df = []
for f in glob.glob("*.xlsx"):
data = pd.read_excel(f, 'Sheet1')
data.index = [os.path.basename(f)] * len(data)
df.append(data)
df = pd.concat(df)
writer = pd.ExcelWriter('output.xlsx')
df.to_excel(writer,'Sheet1')
writer.save()
Excel文件具有以下结构:
输出如下:
为什么python在连接excel文件时会改变第一列?
答案 0 :(得分:0)
我认为你需要:
df = []
for f in glob.glob("*.xlsx"):
data = pd.read_excel(f, 'Sheet1')
name = os.path.basename(f)
#create Multiindex for not overwrite original index
data.index = pd.MultiIndex.from_product([[name], data.index], names=('files','orig'))
df.append(data)
#reset index for columns from MultiIndex
df = pd.concat(df).reset_index()
另一个解决方案是在concat
中使用参数keys
:
files = glob.glob("*.xlsx")
names = [os.path.basename(f) for f in files]
dfs = [pd.read_excel(f, 'Sheet1') for f in files]
df = pd.concat(dfs, keys=names).rename_axis(('files','orig')).reset_index()
与...相同:
df = []
names = []
for f in glob.glob(".xlsx"):
df.append(pd.read_excel(f, 'Sheet1'))
names.append(os.path.basename(f))
df = pd.concat(df, keys=names).rename_axis(('files','orig')).reset_index()
上次写入excel时没有索引且没有列名:
writer = pd.ExcelWriter('output.xlsx')
df.to_excel(writer,'Sheet1', index=False, header=False)
writer.save()