我遇到以下问题:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(401), index=np.linspace(0, 1, 401))
print(np.linspace(0, 1, 401))
我们看到0.47
在那里:
[ 0. 0.0025 0.005 0.0075 0.01 0.0125 0.015 0.0175 0.02
0.0225 0.025 0.0275 0.03 0.0325 0.035 0.0375 0.04 0.0425
0.045 0.0475 0.05 0.0525 0.055 0.0575 0.06 0.0625 0.065
0.0675 0.07 0.0725 0.075 0.0775 0.08 0.0825 0.085 0.0875
0.09 0.0925 0.095 0.0975 0.1 0.1025 0.105 0.1075 0.11
0.1125 0.115 0.1175 0.12 0.1225 0.125 0.1275 0.13 0.1325
0.135 0.1375 0.14 0.1425 0.145 0.1475 0.15 0.1525 0.155
0.1575 0.16 0.1625 0.165 0.1675 0.17 0.1725 0.175 0.1775
0.18 0.1825 0.185 0.1875 0.19 0.1925 0.195 0.1975 0.2
0.2025 0.205 0.2075 0.21 0.2125 0.215 0.2175 0.22 0.2225
0.225 0.2275 0.23 0.2325 0.235 0.2375 0.24 0.2425 0.245
0.2475 0.25 0.2525 0.255 0.2575 0.26 0.2625 0.265 0.2675
0.27 0.2725 0.275 0.2775 0.28 0.2825 0.285 0.2875 0.29
0.2925 0.295 0.2975 0.3 0.3025 0.305 0.3075 0.31 0.3125
0.315 0.3175 0.32 0.3225 0.325 0.3275 0.33 0.3325 0.335
0.3375 0.34 0.3425 0.345 0.3475 0.35 0.3525 0.355 0.3575
0.36 0.3625 0.365 0.3675 0.37 0.3725 0.375 0.3775 0.38
0.3825 0.385 0.3875 0.39 0.3925 0.395 0.3975 0.4 0.4025
0.405 0.4075 0.41 0.4125 0.415 0.4175 0.42 0.4225 0.425
0.4275 0.43 0.4325 0.435 0.4375 0.44 0.4425 0.445 0.4475
0.45 0.4525 0.455 0.4575 0.46 0.4625 0.465 0.4675 0.47
0.4725 0.475 0.4775 0.48 0.4825 0.485 0.4875 0.49 0.4925
0.495 0.4975 0.5 0.5025 0.505 0.5075 0.51 0.5125 0.515
0.5175 0.52 0.5225 0.525 0.5275 0.53 0.5325 0.535 0.5375
0.54 0.5425 0.545 0.5475 0.55 0.5525 0.555 0.5575 0.56
0.5625 0.565 0.5675 0.57 0.5725 0.575 0.5775 0.58 0.5825
0.585 0.5875 0.59 0.5925 0.595 0.5975 0.6 0.6025 0.605
0.6075 0.61 0.6125 0.615 0.6175 0.62 0.6225 0.625 0.6275
0.63 0.6325 0.635 0.6375 0.64 0.6425 0.645 0.6475 0.65
0.6525 0.655 0.6575 0.66 0.6625 0.665 0.6675 0.67 0.6725
0.675 0.6775 0.68 0.6825 0.685 0.6875 0.69 0.6925 0.695
0.6975 0.7 0.7025 0.705 0.7075 0.71 0.7125 0.715 0.7175
0.72 0.7225 0.725 0.7275 0.73 0.7325 0.735 0.7375 0.74
0.7425 0.745 0.7475 0.75 0.7525 0.755 0.7575 0.76 0.7625
0.765 0.7675 0.77 0.7725 0.775 0.7775 0.78 0.7825 0.785
0.7875 0.79 0.7925 0.795 0.7975 0.8 0.8025 0.805 0.8075
0.81 0.8125 0.815 0.8175 0.82 0.8225 0.825 0.8275 0.83
0.8325 0.835 0.8375 0.84 0.8425 0.845 0.8475 0.85 0.8525
0.855 0.8575 0.86 0.8625 0.865 0.8675 0.87 0.8725 0.875
0.8775 0.88 0.8825 0.885 0.8875 0.89 0.8925 0.895 0.8975
0.9 0.9025 0.905 0.9075 0.91 0.9125 0.915 0.9175 0.92
0.9225 0.925 0.9275 0.93 0.9325 0.935 0.9375 0.94 0.9425
0.945 0.9475 0.95 0.9525 0.955 0.9575 0.96 0.9625 0.965
0.9675 0.97 0.9725 0.975 0.9775 0.98 0.9825 0.985 0.9875
0.99 0.9925 0.995 0.9975 1. ]
现在举例来说,我尝试df[0.47]
并收到以下错误:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/opt/conda/lib/python3.5/site-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
2133 try:
-> 2134 return self._engine.get_loc(key)
2135 except KeyError:
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)()
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4238)()
pandas/index.pyx in pandas.index.Int64Engine._check_type (pandas/index.c:8209)()
KeyError: 0.47
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-117-76c97f917184> in <module>()
----> 1 df[0.47]
/opt/conda/lib/python3.5/site-packages/pandas/core/frame.py in __getitem__(self, key)
2057 return self._getitem_multilevel(key)
2058 else:
-> 2059 return self._getitem_column(key)
2060
2061 def _getitem_column(self, key):
/opt/conda/lib/python3.5/site-packages/pandas/core/frame.py in _getitem_column(self, key)
2064 # get column
2065 if self.columns.is_unique:
-> 2066 return self._get_item_cache(key)
2067
2068 # duplicate columns & possible reduce dimensionality
/opt/conda/lib/python3.5/site-packages/pandas/core/generic.py in _get_item_cache(self, item)
1384 res = cache.get(item)
1385 if res is None:
-> 1386 values = self._data.get(item)
1387 res = self._box_item_values(item, values)
1388 cache[item] = res
/opt/conda/lib/python3.5/site-packages/pandas/core/internals.py in get(self, item, fastpath)
3541
3542 if not isnull(item):
-> 3543 loc = self.items.get_loc(item)
3544 else:
3545 indexer = np.arange(len(self.items))[isnull(self.items)]
/opt/conda/lib/python3.5/site-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
2134 return self._engine.get_loc(key)
2135 except KeyError:
-> 2136 return self._engine.get_loc(self._maybe_cast_indexer(key))
2137
2138 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)()
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4238)()
pandas/index.pyx in pandas.index.Int64Engine._check_type (pandas/index.c:8209)()
KeyError: 0.47
我不明白为什么会这样。
答案 0 :(得分:4)
这里的问题是浮动不精确,您可以使用方法get_slice_bound
返回该行的序号位置:
In [237]:
df.iloc[df.index.get_slice_bound(0.47, side='left', kind='loc')]
Out[237]:
0 0.854001
Name: 0.47, dtype: float64
我们可以看到该索引标签的真正价值:
In [238]:
df.index[df.index.get_slice_bound(0.47, side='left', kind='loc')]
Out[238]:
0.47000000000000003
虽然大熊猫确实支持float64Index
,但通过执行此操作会对确切的标签查找造成问题,您最好坚持使用默认Int64Index
get_slice_bound
是一个未记录的方法,但docstring为您提供了足够的信息:
Signature: df.index.get_slice_bound(label, side, kind) Docstring: Calculate slice bound that corresponds to given label.
Returns leftmost (one-past-the-rightmost if ``side=='right'``) position of given label.
Parameters
---------- label : object side : {'left', 'right'} kind : {'ix', 'loc', 'getitem'}
您也可以使用get_loc
并传递method='nearest'
来实现相同目标:
In [240]:
df.iloc[df.index.get_loc(0.47, method='nearest')]
Out[240]:
0 0.854001
Name: 0.47, dtype: float64
答案 1 :(得分:2)
表示可能相同,但值可能略有不同,hash
则不同。
值可能不同,两者的显示仍为0.47
,这会产生误导。
=&GT;您无法通过浮动键可靠地索引元素。
相反,可以使用小数作为键或舍入值。