(pandas版本0.16.0,numpy版本1.9.2)
我试图在列中存储值,并找到原始数据中与每个bin的最大值对应的行。
我找到了一种方法来实现这一点,并且该方法正在处理一些浮点数据,但不是在int数据上:
>>> from pandas import *
>>> df1 = DataFrame({"id": range(3),"a": np.random.random(3)})
>>> df2 = DataFrame({"id": range(3),"a": [0,1,5]})
>>> bins = [0,1,2]
>>> grouped1 = df1.a.groupby(cut(df1.a,bins))
>>> grouped2 = df2.a.groupby(cut(df2.a,bins))
>>> idx1 = grouped1.transform(max) == df1.a
>>> df1[idx1]
a id
0 0.997843 0
>>> idx2 = grouped2.transform(max) == df2.a
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/site-packages/pandas/core/groupby.py", line 2418, in transform
return self._transform_fast(cyfunc)
File "/usr/lib/python2.7/site-packages/pandas/core/groupby.py", line 2459, in _transform_fast
return self._set_result_index_ordered(Series(values))
File "/usr/lib/python2.7/site-packages/pandas/core/groupby.py", line 493, in _set_result_index_ordered
index = Index(np.concatenate([ indices[v] for v in self.grouper.result_index ]))
KeyError: '(1, 2]'
请注意,两个组都会获得带有这些区域的NaN行:
>>> grouped1.max()
a
(0, 1] 0.859684
(1, 2] NaN
Name: a, dtype: float64
>>> grouped2.max()
a
(0, 1] 1
(1, 2] NaN
Name: a, dtype: float64
我无法理解问题所在。具有bin值的KeyError对我来说没有多大意义。