Question

我有一个带有4个不同名称的工作表的excel工作簿。只有在变量sheet_names中调用它们时，我才想将它们读入pandas数据帧。例如，整个工作簿的工作表名称可以是['banana','orange','apple','grape']。每个工作表有5列，我想读入Python。

import pandas as pd

sheet_names =['grapes','orange'] #sheet_names is what I control... it can contain any number of sheets between 1 to 4.

xlsx = pd.ExcelFile('C:\\Users\\Ken\\Desktop\\Df.xlsx')

df = []

for x in sheet_names:
    df.append(xlsx.parse(sheetname=x,index_col=0,parse_cols='B:F'))

但是代码返回一个len = 2的列表。

所需的输出是一个包含10列的数据帧。有什么帮助吗？

Answer 1

xlsx.parse()的每次通话都会返回一个DataFrame，您要将其附加到df 列表。所以在你的代码中df是一个DF列表。如果要合并选定的工作表，可以使用pd.concat()方法：

df = pd.concat([xlsx.parse(sheetname=x,index_col=0,parse_cols='B:F') for x in sheet_names],
               axis=1,
               ignore_index=True)

PS你可能想保留原始索引 - 在这种情况下将ignore_index=True更改为ignore_index=False

通过阅读工作表名称python pandas从excel创建数据框

1 个答案: