查看API documentation,我无法找到DataFrameGroupBy
的类方法。我想知道我是否在错误的地方寻找。
根据the guide,这些对象有以下方法:
In [22]: gb = df.groupby('gender')
In [23]: gb.<TAB>
gb.agg gb.boxplot gb.cummin gb.describe gb.filter gb.get_group gb.height gb.last gb.median gb.ngroups gb.plot gb.rank gb.std gb.transform
gb.aggregate gb.count gb.cumprod gb.dtype gb.first gb.groups gb.hist gb.max gb.min gb.nth gb.prod gb.resample gb.sum gb.var
gb.apply gb.cummax gb.cumsum gb.fillna gb.gender gb.head gb.indices gb.mean gb.name gb.ohlc gb.quantile gb.size gb.tail gb.weight
我在哪里找到他们所做的解释?
答案 0 :(得分:5)
找出函数功能的最简单方法是查阅docstring:
In [24]: gb.filter? # help(gb.filter) in python interpreter
Type: instancemethod
String Form:<bound method DataFrameGroupBy.filter of <pandas.core.groupby.DataFrameGroupBy object at 0x1046ad290>>
File: /Users/andy/pandas/pandas/core/groupby.py
Definition: g.filter(self, func, dropna=True, *args, **kwargs)
Docstring:
Return a copy of a DataFrame excluding elements from groups that
do not satisfy the boolean criterion specified by func.
Parameters
----------
f : function
Function to apply to each subframe. Should return True or False.
dropna : Drop groups that do not pass the filter. True by default;
if False, groups that evaluate False are filled with NaNs.
Notes
-----
Each subframe is endowed the attribute 'name' in case you need to know
which group you are working on.
Example
--------
>>> grouped = df.groupby(lambda x: mapping[x])
>>> grouped.filter(lambda x: x['A'].sum() + x['B'].sum() > 0)
然而,a bug“fall through”方法没有显示有效的文档字符串,而只显示他们调用的DataFrame方法的包装器。
例如,gb.cummin(*args, **kwargs)
相当于gb.apply(lambda x: x.cummin(*args, **kwargs))
。
In [31]: gb.cummin?
Type: function
String Form:<function wrapper at 0x1046a9410>
File: /Users/andy/pandas/pandas/core/groupby.py
Definition: g.cummin(*args, **kwargs)
Docstring: <no docstring>
In [32]: df.cummin?
Type: instancemethod
String Form:
<bound method DataFrame.min of a b
0 1 2
[1 rows x 2 columns]>
File: /Users/andy/pandas/pandas/core/generic.py
Definition: df.cummin(self, axis=None, dtype=None, out=None, skipna=True, **kwargs)
Docstring:
Return cumulative min over requested axis.
Parameters
----------
axis : {index (0), columns (1)}
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result
will be NA
Returns
-------
min : Series
举一个例子来解释这个特定的方法并证明等价:
In [41]: df = pd.DataFrame([[2, 4], [1, 5], [2, 2], [1, 3]], columns=['a', 'b'])
In [42]: df
Out[42]:
a b
0 2 4
1 1 5
2 2 2
3 1 3
In [43]: gb = df.groupby('a')
In [44]: gb.cummin()
Out[44]:
a b
0 2 4
1 1 5
2 2 2
3 1 3
In [45]: gb.apply(lambda x: x.cummin())
Out[45]:
a b
0 2 4
1 1 5
2 2 2
3 1 3
注意:我认为这里有很多悬而未决的成果(为了使这些groupby函数更有效,以及添加文档字符串),我们很可能会在0.14中看到这一点......