Question

我是python的新手，甚至比熊猫更新，但相对精通R。我使用的是Anaconda，Python 3.5和pandas 0.18.1。我试图在excel文件中读取数据帧。诚然，这个文件非常......丑陋。有很多空的空间，缺少标题等（我不确定这是否是任何问题的根源）

我创建了文件对象，然后找到合适的工作表，然后尝试将该工作表作为数据框读取：

xl = pd.ExcelFile(allFiles[i])
sName = [s for s in xl.sheet_names if 'security exposure' in s.lower()]
df = xl.parse(sName)
df

结果：

{'Security exposure - 21 day lag':          Percent of Total Holdings  \
 0      KMNFC vs. 3 Month LIBOR AUD   
 1                        04-OCT-16   
 2                Australian Dollar   
 3                              NaN   
 4                              NaN   
 5                              NaN   
 6                              NaN   
 7                              NaN   
 8          Long/Short Net Exposure   
 9                            Total   
 10                             NaN   
 11                            Long   
 12                             NaN   
 13                             NaN   
 14                             NaN   
 15                             NaN   
 16                             NaN   
 17                             NaN

（这会继续增加20-30行和5-6列）

我正在使用Anaconda和Spyder，它有一个'Variable Explorer'。它将变量df显示为DataFrame类型的字典：

但是，我不能使用iloc：

df.iloc[:,1]

Traceback (most recent call last):

  File "<ipython-input-77-d7b3e16ccc56>", line 1, in <module>
    df.iloc[:,1]

AttributeError: 'dict' object has no attribute 'iloc'

有什么想法？我错过了什么？

编辑：

要清楚，我真正想要做的是引用df的第一列。在R中，这将是df [，1]。环顾四周似乎并不是一种非常流行的做事方式，也不是“正确”的做法。我理解为什么按列名或键进行索引更好，但在这种情况下，我真的只需要按列号索引数据帧。任何工作方法都将非常感激。

编辑（2）：

根据建议，我尝试了'read_excel'，结果相同：

df = pd.ExcelFile(allFiles[i]).parse(sName)
df.loc[1]
Traceback (most recent call last):

  File "<ipython-input-90-fc40aa59bd20>", line 2, in <module>
    df.loc[1]

AttributeError: 'dict' object has no attribute 'loc'


df = pd.read_excel(allFiles[i], sheetname = sName)
df.loc[1]
Traceback (most recent call last):

  File "<ipython-input-91-72b8405c6c42>", line 2, in <module>
    df.loc[1]

AttributeError: 'dict' object has no attribute 'loc'

Answer 1

问题在于：

!file:*intermediates*/&&!file:*generated*/&&!lib:*..*

返回单个元素列表。我将其更改为以下内容：

sName = [s for s in xl.sheet_names if 'security exposure' in s.lower()]

返回一个字符串，然后代码按预期执行。

感谢ayhan指出这一点。

Pandas ExcelFile.parse（）以dict而不是dataframe的形式读取文件

1 个答案: