合并文件优于使用Pandas覆盖Python中的第一列

时间:2017-09-28 12:15:52

标签: python excel pandas

我有很多excel文件,我想使用以下代码附加多个excel文件:

import pandas as pd
import glob
import os
import openpyxl

df = []
for f in glob.glob("*.xlsx"):

    data = pd.read_excel(f, 'Sheet1')
    data.index = [os.path.basename(f)] * len(data)
    df.append(data)

df = pd.concat(df)
writer = pd.ExcelWriter('output.xlsx')
df.to_excel(writer,'Sheet1')
writer.save()

Excel文件具有以下结构:

enter image description here

输出如下:

enter image description here

为什么python在连接excel文件时会改变第一列?

1 个答案:

答案 0 :(得分:0)

我认为你需要:

df = []
for f in glob.glob("*.xlsx"):
    data = pd.read_excel(f, 'Sheet1')
    name = os.path.basename(f)
    #create Multiindex for not overwrite original index
    data.index = pd.MultiIndex.from_product([[name], data.index], names=('files','orig'))
    df.append(data)

#reset index for columns from MultiIndex 
df = pd.concat(df).reset_index()

另一个解决方案是在concat中使用参数keys

files = glob.glob("*.xlsx")
names = [os.path.basename(f) for f in files]
dfs = [pd.read_excel(f, 'Sheet1') for f in files]

df = pd.concat(dfs, keys=names).rename_axis(('files','orig')).reset_index()

与...相同:

df = []
names = []
for f in glob.glob(".xlsx"):
    df.append(pd.read_excel(f, 'Sheet1'))
    names.append(os.path.basename(f))

df = pd.concat(df, keys=names).rename_axis(('files','orig')).reset_index()

上次写入excel时没有索引且没有列名:

writer = pd.ExcelWriter('output.xlsx')
df.to_excel(writer,'Sheet1', index=False, header=False)
writer.save()