Question

在IPython中，我在常规数据框上groupby：

grouped
Out[356]: <pandas.core.groupby.DataFrameGroupBy object at 0x7f0e78578750>

但filter似乎是在获取系列而不是数据框：

     ...: def print_obj(x):
     ...:     print type(x)
     ...:     return True
     ...:



e=grouped.filter(print_obj)
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.frame.DataFrame'>
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-349-a93d384d3560> in <module>()
----> 1 e=grouped.filter(print_obj)

/home/user/anaconda/lib/python2.7/site-packages/pandas/core/groupby.pyc in filter(self, func, dropna, *args, **kwargs)
   2092                 res = path(group)
   2093
-> 2094             if res:
   2095                 indexers.append(self.obj.index.get_indexer(group.index))
   2096

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

然而，当我apply时，我只收到数据帧：

grouped.apply(print_obj)
<class 'pandas.core.frame.DataFrame'>
...

filter docstring说我应该得到Dataframes。为什么？我该如何解决这个问题？（我想简单地从分组df中删除一些组。）

P.S。大熊猫== 0.12.0

Answer 1

在内部，apply和filter尝试不同的循环数据方式：一个肯定适用于任何功能的“慢速路径”，以及仅适用于某些功能的“快速路径”功能。这些路径可以对数据的整个卡盘（作为DataFrame）或一次一行（作为Series）进行操作。

细节是微妙的 - 如果你想要的话，请查看pandas/core/groupby.py - 但要点是print_obj揭示了一些这些内部结构并不是你真正想做的事情。< / p>

您想放弃哪些群组，以及您尝试使用哪种标准？

groupby.filter适用于系列而不是数据帧？（熊猫）

1 个答案:

groupby.filter适用于系列而不是数据帧？ （熊猫）

1 个答案:

groupby.filter适用于系列而不是数据帧？（熊猫）