API引用缺少DataFrameGroupBy对象?

时间:2014-02-13 19:23:37

标签: python pandas

查看API documentation,我无法找到DataFrameGroupBy的类方法。我想知道我是否在错误的地方寻找。

根据the guide,这些对象有以下方法:

In [22]: gb = df.groupby('gender')
In [23]: gb.<TAB>
gb.agg        gb.boxplot    gb.cummin     gb.describe   gb.filter     gb.get_group  gb.height     gb.last       gb.median     gb.ngroups    gb.plot       gb.rank       gb.std        gb.transform
gb.aggregate  gb.count      gb.cumprod    gb.dtype      gb.first      gb.groups     gb.hist       gb.max        gb.min        gb.nth        gb.prod       gb.resample   gb.sum        gb.var
gb.apply      gb.cummax     gb.cumsum     gb.fillna     gb.gender     gb.head       gb.indices    gb.mean       gb.name       gb.ohlc       gb.quantile   gb.size       gb.tail       gb.weight

我在哪里找到他们所做的解释?

1 个答案:

答案 0 :(得分:5)

找出函数功能的最简单方法是查阅docstring:

In [24]: gb.filter?  # help(gb.filter) in python interpreter
Type:       instancemethod
String Form:<bound method DataFrameGroupBy.filter of <pandas.core.groupby.DataFrameGroupBy object at 0x1046ad290>>
File:       /Users/andy/pandas/pandas/core/groupby.py
Definition: g.filter(self, func, dropna=True, *args, **kwargs)
Docstring:
Return a copy of a DataFrame excluding elements from groups that
do not satisfy the boolean criterion specified by func.

Parameters
----------
f : function
    Function to apply to each subframe. Should return True or False.
dropna : Drop groups that do not pass the filter. True by default;
    if False, groups that evaluate False are filled with NaNs.

Notes
-----
Each subframe is endowed the attribute 'name' in case you need to know
which group you are working on.

Example
--------
>>> grouped = df.groupby(lambda x: mapping[x])
>>> grouped.filter(lambda x: x['A'].sum() + x['B'].sum() > 0)

然而,a bug“fall through”方法没有显示有效的文档字符串,而只显示他们调用的DataFrame方法的包装器。

例如,gb.cummin(*args, **kwargs)相当于gb.apply(lambda x: x.cummin(*args, **kwargs))

In [31]: gb.cummin?
Type:       function
String Form:<function wrapper at 0x1046a9410>
File:       /Users/andy/pandas/pandas/core/groupby.py
Definition: g.cummin(*args, **kwargs)
Docstring:  <no docstring>

In [32]: df.cummin?
Type:       instancemethod
String Form:
<bound method DataFrame.min of    a  b
0  1  2

[1 rows x 2 columns]>
File:       /Users/andy/pandas/pandas/core/generic.py
Definition: df.cummin(self, axis=None, dtype=None, out=None, skipna=True, **kwargs)
Docstring:
Return cumulative min over requested axis.

Parameters
----------
axis : {index (0), columns (1)}
skipna : boolean, default True
    Exclude NA/null values. If an entire row/column is NA, the result
    will be NA

Returns
-------
min : Series

举一个例子来解释这个特定的方法并证明等价:

In [41]: df = pd.DataFrame([[2, 4], [1, 5], [2, 2], [1, 3]], columns=['a', 'b'])

In [42]: df
Out[42]:
   a  b
0  2  4
1  1  5
2  2  2
3  1  3

In [43]: gb = df.groupby('a')

In [44]: gb.cummin()
Out[44]:
   a  b
0  2  4
1  1  5
2  2  2
3  1  3

In [45]: gb.apply(lambda x: x.cummin())
Out[45]:
   a  b
0  2  4
1  1  5
2  2  2
3  1  3

注意:我认为这里有很多悬而未决的成果(为了使这些groupby函数更有效,以及添加文档字符串),我们很可能会在0.14中看到这一点......