Question

我只是从Pandas 0.11升级到0.13.0rc1。升级导致一个与Series.fillna（）相关的错误。

>>> df
                   sales  net_pft
STK_ID RPT_Date                  
600809 20060331   5.8951   1.1241
       20060630   8.3031   1.5464
       20060930  11.9084   2.2990
       20061231      NaN   2.6060
       20070331   5.9129   1.3334

[5 rows x 2 columns]
>>> type(df['sales'])
<class 'pandas.core.series.Series'>
>>> df['sales'] = df['sales'].fillna(df['net_pft'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\Python27\lib\site-packages\pandas\core\generic.py", line 1912, in fillna
    obj.fillna(v, inplace=True)
AttributeError: 'numpy.float64' object has no attribute 'fillna'
>>>

为什么df['sales']在'numpy.float64'中使用fillna()成为{{1}}对象？如何正确地“用另一列的值填充一列的NaN”？

Answer 1

最近有一个关于此问题的讨论，并在pandas master中修复：https://github.com/pydata/pandas/issues/5703（在0.13rc1发布之后，将在最终的0.13中修复）。

注意：行为改变了！这是pandas＆lt; = 0.12中不支持的行为，因为@ behzad.nouri指出（使用Series作为fillna的输入）。然而它确实有效，但显然是基于位置，这是错误的。但只要两个系列（在你的情况下df['sales']和df['net_pft']）具有相同的索引，这都无关紧要。
在pandas 0.13中，它将受支持但基于系列的索引。请在此处查看评论：https://github.com/pydata/pandas/issues/5703#issuecomment-30663525

Answer 2

它似乎更像你要做的是：

idx = df['sales'].isnull( )
df['sales'][ idx ] = df['net_pft'][ idx ]

因为你提供的value fillna参数是一个系列，所以代码进入下面的分支，为所提供的系列的每个索引项调用fillna。如果self是一个DataFrame，那么它可以正常工作，也就是说每列使用提供的系列fillna，但由于这里的self是一个系列，它将会中断。

在documentation到fillna一个DataFrame中，参数值可以是

交替显示一个值的字典，指定每列使用哪个值（不在dict中的列将被填充）。

从下面的源代码中，如果value是一个系列，它将与使用系列索引作为fillna相应列的键的dict一样工作。

    else:   # value is not None
        if method is not None:
            raise ValueError('cannot specify both a fill method and value')

        if len(self._get_axis(axis)) == 0:
            return self
        if isinstance(value, (dict, com.ABCSeries)):
            if axis == 1:
                raise NotImplementedError('Currently only can fill '
                                          'with dict/Series column '
                                          'by column')

            result = self if inplace else self.copy()
            for k, v in compat.iteritems(value):
                if k not in result:
                    continue
                obj = result[k]
                obj.fillna(v, inplace=True)
            return result
        else:
            new_data = self._data.fillna(value, inplace=inplace,
                                         downcast=downcast)

如何解决与Series.fillna（）相关的Pandas问题？

2 个答案: