pandas KeyError,使用浮点数时找不到索引

时间:2017-03-09 13:34:44

标签: python pandas

我遇到以下问题:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(401), index=np.linspace(0, 1, 401))
print(np.linspace(0, 1, 401))

我们看到0.47在那里:

[ 0.      0.0025  0.005   0.0075  0.01    0.0125  0.015   0.0175  0.02
  0.0225  0.025   0.0275  0.03    0.0325  0.035   0.0375  0.04    0.0425
  0.045   0.0475  0.05    0.0525  0.055   0.0575  0.06    0.0625  0.065
  0.0675  0.07    0.0725  0.075   0.0775  0.08    0.0825  0.085   0.0875
  0.09    0.0925  0.095   0.0975  0.1     0.1025  0.105   0.1075  0.11
  0.1125  0.115   0.1175  0.12    0.1225  0.125   0.1275  0.13    0.1325
  0.135   0.1375  0.14    0.1425  0.145   0.1475  0.15    0.1525  0.155
  0.1575  0.16    0.1625  0.165   0.1675  0.17    0.1725  0.175   0.1775
  0.18    0.1825  0.185   0.1875  0.19    0.1925  0.195   0.1975  0.2
  0.2025  0.205   0.2075  0.21    0.2125  0.215   0.2175  0.22    0.2225
  0.225   0.2275  0.23    0.2325  0.235   0.2375  0.24    0.2425  0.245
  0.2475  0.25    0.2525  0.255   0.2575  0.26    0.2625  0.265   0.2675
  0.27    0.2725  0.275   0.2775  0.28    0.2825  0.285   0.2875  0.29
  0.2925  0.295   0.2975  0.3     0.3025  0.305   0.3075  0.31    0.3125
  0.315   0.3175  0.32    0.3225  0.325   0.3275  0.33    0.3325  0.335
  0.3375  0.34    0.3425  0.345   0.3475  0.35    0.3525  0.355   0.3575
  0.36    0.3625  0.365   0.3675  0.37    0.3725  0.375   0.3775  0.38
  0.3825  0.385   0.3875  0.39    0.3925  0.395   0.3975  0.4     0.4025
  0.405   0.4075  0.41    0.4125  0.415   0.4175  0.42    0.4225  0.425
  0.4275  0.43    0.4325  0.435   0.4375  0.44    0.4425  0.445   0.4475
  0.45    0.4525  0.455   0.4575  0.46    0.4625  0.465   0.4675  0.47
  0.4725  0.475   0.4775  0.48    0.4825  0.485   0.4875  0.49    0.4925
  0.495   0.4975  0.5     0.5025  0.505   0.5075  0.51    0.5125  0.515
  0.5175  0.52    0.5225  0.525   0.5275  0.53    0.5325  0.535   0.5375
  0.54    0.5425  0.545   0.5475  0.55    0.5525  0.555   0.5575  0.56
  0.5625  0.565   0.5675  0.57    0.5725  0.575   0.5775  0.58    0.5825
  0.585   0.5875  0.59    0.5925  0.595   0.5975  0.6     0.6025  0.605
  0.6075  0.61    0.6125  0.615   0.6175  0.62    0.6225  0.625   0.6275
  0.63    0.6325  0.635   0.6375  0.64    0.6425  0.645   0.6475  0.65
  0.6525  0.655   0.6575  0.66    0.6625  0.665   0.6675  0.67    0.6725
  0.675   0.6775  0.68    0.6825  0.685   0.6875  0.69    0.6925  0.695
  0.6975  0.7     0.7025  0.705   0.7075  0.71    0.7125  0.715   0.7175
  0.72    0.7225  0.725   0.7275  0.73    0.7325  0.735   0.7375  0.74
  0.7425  0.745   0.7475  0.75    0.7525  0.755   0.7575  0.76    0.7625
  0.765   0.7675  0.77    0.7725  0.775   0.7775  0.78    0.7825  0.785
  0.7875  0.79    0.7925  0.795   0.7975  0.8     0.8025  0.805   0.8075
  0.81    0.8125  0.815   0.8175  0.82    0.8225  0.825   0.8275  0.83
  0.8325  0.835   0.8375  0.84    0.8425  0.845   0.8475  0.85    0.8525
  0.855   0.8575  0.86    0.8625  0.865   0.8675  0.87    0.8725  0.875
  0.8775  0.88    0.8825  0.885   0.8875  0.89    0.8925  0.895   0.8975
  0.9     0.9025  0.905   0.9075  0.91    0.9125  0.915   0.9175  0.92
  0.9225  0.925   0.9275  0.93    0.9325  0.935   0.9375  0.94    0.9425
  0.945   0.9475  0.95    0.9525  0.955   0.9575  0.96    0.9625  0.965
  0.9675  0.97    0.9725  0.975   0.9775  0.98    0.9825  0.985   0.9875
  0.99    0.9925  0.995   0.9975  1.    ]

现在举例来说,我尝试df[0.47]并收到以下错误:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/opt/conda/lib/python3.5/site-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
   2133             try:
-> 2134                 return self._engine.get_loc(key)
   2135             except KeyError:

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4238)()

pandas/index.pyx in pandas.index.Int64Engine._check_type (pandas/index.c:8209)()

KeyError: 0.47

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-117-76c97f917184> in <module>()
----> 1 df[0.47]

/opt/conda/lib/python3.5/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2057             return self._getitem_multilevel(key)
   2058         else:
-> 2059             return self._getitem_column(key)
   2060 
   2061     def _getitem_column(self, key):

/opt/conda/lib/python3.5/site-packages/pandas/core/frame.py in _getitem_column(self, key)
   2064         # get column
   2065         if self.columns.is_unique:
-> 2066             return self._get_item_cache(key)
   2067 
   2068         # duplicate columns & possible reduce dimensionality

/opt/conda/lib/python3.5/site-packages/pandas/core/generic.py in _get_item_cache(self, item)
   1384         res = cache.get(item)
   1385         if res is None:
-> 1386             values = self._data.get(item)
   1387             res = self._box_item_values(item, values)
   1388             cache[item] = res

/opt/conda/lib/python3.5/site-packages/pandas/core/internals.py in get(self, item, fastpath)
   3541 
   3542             if not isnull(item):
-> 3543                 loc = self.items.get_loc(item)
   3544             else:
   3545                 indexer = np.arange(len(self.items))[isnull(self.items)]

/opt/conda/lib/python3.5/site-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
   2134                 return self._engine.get_loc(key)
   2135             except KeyError:
-> 2136                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2137 
   2138         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4238)()

pandas/index.pyx in pandas.index.Int64Engine._check_type (pandas/index.c:8209)()

KeyError: 0.47

我不明白为什么会这样。

2 个答案:

答案 0 :(得分:4)

这里的问题是浮动不精确,您可以使用方法get_slice_bound返回该行的序号位置:

In [237]:
df.iloc[df.index.get_slice_bound(0.47, side='left', kind='loc')]

Out[237]:
0    0.854001
Name: 0.47, dtype: float64

我们可以看到该索引标签的真正价值:

In [238]:
df.index[df.index.get_slice_bound(0.47, side='left', kind='loc')]
Out[238]:
0.47000000000000003

虽然大熊猫确实支持float64Index,但通过执行此操作会对确切的标签查找造成问题,您最好坚持使用默认Int64Index

get_slice_bound是一个未记录的方法,但docstring为您提供了足够的信息:

Signature: df.index.get_slice_bound(label, side, kind) Docstring: Calculate slice bound that corresponds to given label.

Returns leftmost (one-past-the-rightmost if ``side=='right'``) position of given label.

Parameters
---------- label : object side : {'left', 'right'} kind : {'ix', 'loc', 'getitem'}

您也可以使用get_loc并传递method='nearest'来实现相同目标:

In [240]:
df.iloc[df.index.get_loc(0.47, method='nearest')]

Out[240]:
0    0.854001
Name: 0.47, dtype: float64

答案 1 :(得分:2)

表示可能相同,但值可能略有不同,hash则不同。

值可能不同,两者的显示仍为0.47,这会产生误导。

=&GT;您无法通过浮动键可靠地索引元素。

相反,可以使用小数作为键或舍入值。