我的数据类似于这篇文章:pandas: Filling missing values within a group
也就是说,我在一些观察会议中有数据,每个会话都有一个焦点个体。该焦点人员只注意一次,但我想在该会话期间填写每一行的焦点ID数据。所以,数据看起来像这样:
Focal Session
0 NaN 1
1 50101 1
2 NaN 1
3 NaN 2
4 50408 2
5 NaN 2
根据上面链接的帖子,我使用的是这段代码:
g = data.groupby('Session')
g['Focal'].transform(lambda s: s.loc[s.first_valid_index()])
但是这会返回一个KeyError(特别是KeyError:None)。根据.loc文档,未找到数据时可能会导致KeyErrors。所以,我已经检查过,当我有152个会话时,我在Focal列中只有150个非空数据点。在我决定手动搜索我的数据中哪些会话缺少一个Focal ID之前,我有两个问题:
我非常喜欢初学者。那么这是一个合理的解释为什么我得到一个KeyError?
如果合理,有没有办法找出哪个会话缺少了Focal ID数据,这样可以省去手动查看数据?
此处输出:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-330-0e4f27aa7e14> in <module>()
----> 1 data['Focal'] = g['Focal'].transform(lambda s: s.loc[s.first_valid_index()])
2 g['Focal'].transform(lambda s: s.loc[s.first_valid_index()])
//anaconda/lib/python2.7/site-packages/pandas/core/groupby.pyc in transform(self, func, *args, **kwargs)
1540 for name, group in self:
1541 object.__setattr__(group, 'name', name)
-> 1542 res = wrapper(group)
1543 # result[group.index] = res
1544 indexer = self.obj.index.get_indexer(group.index)
//anaconda/lib/python2.7/site-packages/pandas/core/groupby.pyc in <lambda>(x)
1536 wrapper = lambda x: getattr(x, func)(*args, **kwargs)
1537 else:
-> 1538 wrapper = lambda x: func(x, *args, **kwargs)
1539
1540 for name, group in self:
<ipython-input-330-0e4f27aa7e14> in <lambda>(s)
----> 1 data['Focal'] = g['Focal'].transform(lambda s: s.loc[s.first_valid_index()])
2 g['Focal'].transform(lambda s: s.loc[s.first_valid_index()])
//anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in __getitem__(self, key)
669 return self._getitem_tuple(key)
670 else:
--> 671 return self._getitem_axis(key, axis=0)
672
673 def _getitem_axis(self, key, axis=0):
//anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in _getitem_axis(self, key, axis)
756 return self._getitem_iterable(key, axis=axis)
757 else:
--> 758 return self._get_label(key, axis=axis)
759
760 class _iLocIndexer(_LocationIndexer):
//anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in _get_label(self, label, axis)
58 return self.obj._xs(label, axis=axis, copy=False)
59 except Exception:
---> 60 return self.obj._xs(label, axis=axis, copy=True)
61
62 def _get_loc(self, key, axis=0):
//anaconda/lib/python2.7/site-packages/pandas/core/series.pyc in _xs(self, key, axis, level, copy)
570
571 def _xs(self, key, axis=0, level=None, copy=True):
--> 572 return self.__getitem__(key)
573
574 def _ixs(self, i, axis=0):
//anaconda/lib/python2.7/site-packages/pandas/core/series.pyc in __getitem__(self, key)
611 def __getitem__(self, key):
612 try:
--> 613 return self.index.get_value(self, key)
614 except InvalidIndexError:
615 pass
//anaconda/lib/python2.7/site-packages/pandas/core/index.pyc in get_value(self, series, key)
761 """
762 try:
--> 763 return self._engine.get_value(series, key)
764 except KeyError, e1:
765 if len(self) > 0 and self.inferred_type == 'integer':
//anaconda/lib/python2.7/site-packages/pandas/index.so in pandas.index.IndexEngine.get_value (pandas/index.c:2565)()
//anaconda/lib/python2.7/site-packages/pandas/index.so in pandas.index.IndexEngine.get_value (pandas/index.c:2380)()
//anaconda/lib/python2.7/site-packages/pandas/index.so in pandas.index.IndexEngine.get_loc (pandas/index.c:3166)()
KeyError: None
答案 0 :(得分:1)
问题是,如果没有有效值,first_valid_index
会返回None(DataFrame中的某些组都是NaN):
In [1]: s = pd.Series([np.nan])
In [2]: s.first_valid_index() # None
现在,loc
会抛出错误,因为没有索引None
:
In [3]: s.loc[s.first_valid_index()]
KeyError: None
在这种特殊情况下,您希望代码做什么? ...
如果你想让它成为NaN,你可以回填然后拿第一个元素:
g['Focal'].transform(lambda s: s.bfill().iloc[0])
答案 1 :(得分:0)
如果您想解决某些群组仅包含Nan的问题,您可以执行以下操作:
通过这种方式,您可以输入“无值”来汇总&#39; (或者你想要的任何东西)当程序找到特定组的所有Nan时,而不是阻止执行返回错误。
希望这会有所帮助:)
费德里科