Question

我有一个带有两张纸的Excel文件，我正在尝试将它们读入数据框，如下面的代码所示。但是，我收到了错误

KeyError: "['months_to_maturity' 'asset_id' 'orig_iss_dt' 'maturity_dt' 'pay_freq_cd'\n 'coupon' 'closing_price'] not in index"

在

行

return df[['months_to_maturity', 'asset_id', 'orig_iss_dt', 'maturity_dt' , 'pay_freq_cd', 'coupon', 'closing_price']]

SecondExcelFileReader()函数中的

。但是，这两张表都有标题

asset_id    orig_iss_dt maturity_dt  pay_freq_cd    coupon  closing_price   months_to_maturity

我按以下方式返回df，因为这就是我想要列的顺序。

def ExcelFileReader():
    xls = pd.ExcelFile('D:/USDataRECENTLY.xls')
    df = xls.parse(xls.sheet_names[0])
    return df[['months_to_maturity', 'asset_id', 'orig_iss_dt', 'maturity_dt' , 'pay_freq_cd', 'coupon', 'closing_price']]


def SecondExcelFileReader():
    xls = pd.ExcelFile('D:/USDataRECENTLY.xls')
    df = xls.parse(xls.sheet_names[1])
    return df[['months_to_maturity', 'asset_id', 'orig_iss_dt', 'maturity_dt' , 'pay_freq_cd', 'coupon', 'closing_price']]

def mergingdataframes():
    df1 = ExcelFileReader()
    df2 = SecondExcelFileReader()
    return pd.concat([df1, df2])

编辑：此Excel文件是从Sybase Oracle SQL Developer导出的，因此第一张表已经带有标题。我只是复制并粘贴了第二张相同标题的表格。另外，我只对第二张表有问题。

表1：

表2：

Answer 1

def ExcelFileReader():
    xls = pd.ExcelFile('D:/USDataRECENTLY.xls')
    sheet_num = xls.sheet_names.index(xls.sheet_names[0])
    df = pd.read_excel('D:/USDataRECENTLY.xls',sheetname=sheet_num)
    return df[['months_to_maturity', 'asset_id', 'orig_iss_dt', 'maturity_dt' ,'pay_freq_cd', 'coupon', 'closing_price']]

或者在这种情况下，您可以使用sheetname = xls.sheet_names[0]

代替sheetname=0

看起来您的问题是您的第二个工作表名称是“Sheet1”并且基于ExcelParser文档“Sheet1”表示第一个工作表，但在您的情况下，它是第二个工作表。 http://pandas.pydata.org/pandas-docs/stable/generated/pandas.ExcelFile.parse.html

更好的实施方式是：

def mergingdataframes():
    mergedf= pd.concat(pd.read_excel('D:/USDataRECENTLY.xls', sheetname=[0,1]))
    mergedf.index = mergedf.index.droplevel(0)# need this to drop dict keys
    return mergedf

从Excel数据读入数据帧时的KeyError

1 个答案: