使用s.loc和s.first_valid_index()时的KeyError

时间:2013-09-24 04:49:51

标签: python pandas loc

我的数据类似于这篇文章:pandas: Filling missing values within a group

也就是说,我在一些观察会议中有数据,每个会话都有一个焦点个体。该焦点人员只注意一次,但我想在该会话期间填写每一行的焦点ID数据。所以,数据看起来像这样:

     Focal    Session
0    NaN      1
1    50101    1
2    NaN      1
3    NaN      2
4    50408    2
5    NaN      2

根据上面链接的帖子,我使用的是这段代码:

g = data.groupby('Session')

g['Focal'].transform(lambda s: s.loc[s.first_valid_index()])

但是这会返回一个KeyError(特别是KeyError:None)。根据.loc文档,未找到数据时可能会导致KeyErrors。所以,我已经检查过,当我有152个会话时,我在Focal列中只有150个非空数据点。在我决定手动搜索我的数据中哪些会话缺少一个Focal ID之前,我有两个问题:

  1. 我非常喜欢初学者。那么这是一个合理的解释为什么我得到一个KeyError?

  2. 如果合理,有没有办法找出哪个会话缺少了Focal ID数据,这样可以省去手动查看数据?

  3. 此处输出:

    ---------------------------------------------------------------------------
    KeyError                                  Traceback (most recent call last)
    <ipython-input-330-0e4f27aa7e14> in <module>()
    ----> 1 data['Focal'] = g['Focal'].transform(lambda s: s.loc[s.first_valid_index()])
          2 g['Focal'].transform(lambda s: s.loc[s.first_valid_index()])
    
    //anaconda/lib/python2.7/site-packages/pandas/core/groupby.pyc in transform(self, func,     *args, **kwargs)
       1540         for name, group in self:
       1541             object.__setattr__(group, 'name', name)
    -> 1542             res = wrapper(group)
       1543             # result[group.index] = res
       1544             indexer = self.obj.index.get_indexer(group.index)
    
    //anaconda/lib/python2.7/site-packages/pandas/core/groupby.pyc in <lambda>(x)
       1536             wrapper = lambda x: getattr(x, func)(*args, **kwargs)
       1537         else:
    -> 1538             wrapper = lambda x: func(x, *args, **kwargs)
       1539 
       1540         for name, group in self:
    
    <ipython-input-330-0e4f27aa7e14> in <lambda>(s)
    ----> 1 data['Focal'] = g['Focal'].transform(lambda s: s.loc[s.first_valid_index()])
          2 g['Focal'].transform(lambda s: s.loc[s.first_valid_index()])
    
    //anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in __getitem__(self, key)
        669             return self._getitem_tuple(key)
        670         else:
    --> 671             return self._getitem_axis(key, axis=0)
        672 
        673     def _getitem_axis(self, key, axis=0):
    
    //anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in _getitem_axis(self, key, axis)
        756             return self._getitem_iterable(key, axis=axis)
        757         else:
    --> 758             return self._get_label(key, axis=axis)
        759 
        760 class _iLocIndexer(_LocationIndexer):
    
    //anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in _get_label(self, label, axis)
         58             return self.obj._xs(label, axis=axis, copy=False)
         59         except Exception:
    ---> 60             return self.obj._xs(label, axis=axis, copy=True)
         61 
         62     def _get_loc(self, key, axis=0):
    
    //anaconda/lib/python2.7/site-packages/pandas/core/series.pyc in _xs(self, key, axis, level, copy)
        570 
        571     def _xs(self, key, axis=0, level=None, copy=True):
    --> 572         return self.__getitem__(key)
        573 
        574     def _ixs(self, i, axis=0):
    
    //anaconda/lib/python2.7/site-packages/pandas/core/series.pyc in __getitem__(self, key)
        611     def __getitem__(self, key):
        612         try:
    --> 613             return self.index.get_value(self, key)
        614         except InvalidIndexError:
        615             pass
    
    //anaconda/lib/python2.7/site-packages/pandas/core/index.pyc in get_value(self, series, key)
        761         """
        762         try:
    --> 763             return self._engine.get_value(series, key)
        764         except KeyError, e1:
        765             if len(self) > 0 and self.inferred_type == 'integer':
    
    //anaconda/lib/python2.7/site-packages/pandas/index.so in pandas.index.IndexEngine.get_value (pandas/index.c:2565)()
    
    //anaconda/lib/python2.7/site-packages/pandas/index.so in pandas.index.IndexEngine.get_value (pandas/index.c:2380)()
    
    //anaconda/lib/python2.7/site-packages/pandas/index.so in pandas.index.IndexEngine.get_loc (pandas/index.c:3166)()
    
    KeyError: None
    

2 个答案:

答案 0 :(得分:1)

问题是,如果没有有效值,first_valid_index会返回None(DataFrame中的某些组都是NaN):

In [1]: s = pd.Series([np.nan])

In [2]: s.first_valid_index() # None

现在,loc会抛出错误,因为没有索引None

In [3]: s.loc[s.first_valid_index()]
KeyError: None

在这种特殊情况下,您希望代码做什么? ...
如果你想让它成为NaN,你可以回填然后拿第一个元素:

g['Focal'].transform(lambda s: s.bfill().iloc[0])

答案 1 :(得分:0)

如果您想解决某些群组仅包含Nan的问题,您可以执行以下操作:

  1. g = data.groupby(&#39; Session&#39;)
  2. g [&#39;焦点&#39;]。转换(lambda s:&#39;没有值汇总&#39;如果pd.isnull(s).all()== True else s.loc [ s.first_valid_index()])
  3. df [&#39; Focal&#39;] = g [&#39;焦点&#39;]。转换(lambda s:&#39;没有值汇总&#39;如果pd.isnull(s) .all()== True else s.loc [s.first_valid_index()])
  4. 通过这种方式,您可以输入“无值”来汇总&#39; (或者你想要的任何东西)当程序找到特定组的所有Nan时,而不是阻止执行返回错误。

    希望这会有所帮助:)

    费德里科