使用groupby,transform和NaNs产生奇怪的结果

时间:2015-03-31 20:48:37

标签: python numpy pandas

编辑(2015年5月19日):我刚刚确认此版本已于0.16.1版本修复,因此在最新版本中这不应该是一个问题。

这些都应该给出相同的结果,对吗?

df.groupby(level=0).transform('mean')
df.groupby(level=0)['x'].transform(np.nanmean)
df.groupby(level=0)['x'].transform('mean')

前两个是好的,但第三个不起作用。可能是一个错误?

df = pd.DataFrame({ 'x':[1,np.nan,3,4] }, index=[1,1,2,2],)

df
Out[686]: 
    x
1   1
1 NaN
2   3
2   4

df.groupby(level=0).transform('mean')
Out[687]: 
     x
1  1.0
1  1.0
2  3.5
2  3.5

df.groupby(level=0)['x'].transform(np.nanmean)
Out[688]: 
1    1.0
1    1.0
2    3.5
2    3.5
Name: x, dtype: float64

这一切都很好,但不是这样:

df.groupby(level=0)['x'].transform('mean')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-691-24761ee742fd> in <module>()
----> 1 df.groupby(level=0)['x'].transform('mean')

C:\Users\eilerj\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\groupby.pyc in transform(self, func, *args, **kwargs)
   2411         # if string function
   2412         if isinstance(func, compat.string_types):
-> 2413             return self._transform_fast(lambda : getattr(self, func)(*args, **kwargs))
   2414 
   2415         # do we have a cython function

C:\Users\eilerj\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\groupby.pyc in _transform_fast(self, func)
   2457         values = np.repeat(values, com._ensure_platform_int(counts))
   2458 
-> 2459         return self._set_result_index_ordered(Series(values))
   2460 
   2461     def filter(self, func, dropna=True, *args, **kwargs):

C:\Users\eilerj\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\groupby.pyc in _set_result_index_ordered(self, result)
    495             result = result.sort_index()
    496 
--> 497         result.index = self.obj.index
    498         return result
    499 

C:\Users\eilerj\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\generic.pyc in __setattr__(self, name, value)
   1978         try:
   1979             object.__getattribute__(self, name)
-> 1980             return object.__setattr__(self, name, value)
   1981         except AttributeError:
   1982             pass

C:\Users\eilerj\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\lib.pyd in pandas.lib.AxisProperty.__set__ (pandas\lib.c:38795)()

C:\Users\eilerj\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\series.pyc in _set_axis(self, axis, labels, fastpath)
    266         object.__setattr__(self, '_index', labels)
    267         if not fastpath:
--> 268             self._data.set_axis(axis, labels)
    269 
    270     def _set_subtyp(self, is_all_dates):

C:\Users\eilerj\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\internals.pyc in set_axis(self, axis, new_labels)
   2209         if new_len != old_len:
   2210             raise ValueError('Length mismatch: Expected axis has %d elements, '
-> 2211                              'new values have %d elements' % (old_len, new_len))
   2212 
   2213         self.axes[axis] = new_labels

ValueError: Length mismatch: Expected axis has 3 elements, new values have 4 elements

1 个答案:

答案 0 :(得分:0)

我已经确认在版本0.16.1中确实已经修复了这个问题。请参阅@DSM和@AndyHayden上面的评论。