Question

我已经采用了一个数据框（初始索引为0 ... 9999），并按年份分区：

requests_df = {year : df[df['req_year'] == year] for year in df['req_year'].unique()}

按照惯例，每个子框架都保留自己的索引顺序。然后，在尝试索引其中一个隔离帧（df_yr = requests_df[2015]）时，我得到了这个意外的行为：

for idx in df_year.index:
        qty = frame[idx]['qty_tickets']

导致：

KeyError                                  Traceback (most recent call last)
/home/user/ve/ml/lib/python3.5/site-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
   2133             try:
-> 2134                 return self._engine.get_loc(key)
   2135             except KeyError:

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)()

KeyError: 8666

我想到了迭代器，我尝试了一个简单的案例：

df_yr[df_yr.index[0]]

KeyError

笏。

8666绝对是第一行的索引值：

Int64Index([8666, 8667, 8668, 8669, 8670, 8671, 8672, 8673, 8674, 8675,
            ...
            9830, 9831, 9832, 9833, 9834, 9835, 9836, 9837, 9838, 9839],
           dtype='int64', length=1174)

使用loc索引，

outframe.loc[8666]

我虽然依赖于df.index值，但工作正常。 wat。

df.ix也可以，但太令人惊讶，因为它内置了后备内容。

我已经使用df.index的操作索引数十次没有问题。是什么给了什么？

Answer 1

通常，df[index]将执行基于列标签的索引。正如您所注意到的，例外是

df[slice]将对行进行切片
df[boolean_mask]将根据掩码

除了这两个例外，没有有效的方法来消除df[row_label]和df[col_label]的歧义，因此Pandas使用后者的解释，因为它与“类字典”数据帧更加一致。您对df_yr[df_yr.index[0]]的实验引发了错误，因为您尝试使用行索引标签，其中需要列索引标签。

相反，使用基于多轴标签的索引，其语法为

df.loc[row_indexer, col_indexer]

其中col_indexer是可选的。 df.loc[df.index[0]]应该可以正常工作。在代码的断面部分中，使用

frame.loc[idx, 'qty_tickets']

（这也是noted by jezrael in the comments）。

Pandas索引表现出意外：df [df.index [0]] =＆gt; KeyError异常

1 个答案: