如何将Excel工作簿中的多个电子表格合并为熊猫数据框?

时间:2018-08-31 16:10:32

标签: python excel pandas dataframe glob

我有多个文件夹和子文件夹,其中包含带有多个选项卡的Excel工作簿。如何将所有信息合并到1个熊猫数据框中?

到目前为止,这是我的代码:

location ~* ^/base/(?!search)[^\d]+$ {
  rewrite ^/base/(?!search)(.+) https://$server_name/base/search/$1 permanent;
}

这将产生以下错误: from pathlib import Path import os import pandas as pd import glob p = Path(r'C:\Users\user1\Downloads\key_folder') globbed_files = p.glob('**/**/*.xlsx') df = [] for file in globbed_files: frame = pd.read_excel(file, sheet_name = None, ignore_index=True) frame['File Path'] = os.path.basename(file) df.append(frame) # df = pd.concat([d.values() for d in df], axis = 0, ignore_index=True) df = pd.concat(df, axis=0, ignore_index = True)

运行cannot concatenate object of type "<class 'collections.OrderedDict'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid时,我看到每个Excel电子表格选项卡都是一个单独的列。单元格包含文本形式的数据和标题,形成一个很长的字符串。

感谢您的帮助!谢谢!

1 个答案:

答案 0 :(得分:0)

这是最终代码:

    from pathlib import Path
    import os
    import pandas as pd
    import glob
    import xlrd

    p = Path('path here')

    globbed_files = p.glob('**/**/*.xlsx')

    list_dfs = []
    dfs = []

    for file in globbed_files:
        xls = xlrd.open_workbook(file, on_demand=True)
        for sheet_name in xls.sheet_names():
            df = pd.read_excel(file,sheet_name)
            df['Sheet Name'] = sheet_name
            list_dfs.append(df)

    dfs = pd.concat(list_dfs,axis=0)

    dfs.to_excel('merged spreadsheet.xlsx')