在Groupby文档中,我只看到应用于轴0索引的函数或列标签的分组示例。我没有看到讨论如何按照将一个函数应用于列而得到的标签进行分组的示例。我认为这将使用apply
完成。以下示例是最好的方法吗?
df = pd.DataFrame({'name' : np.random.choice(['a','b','c','d','e'], 20),
'num1': np.random.randint(low = 30, high=100, size=20),
'num2': np.random.randint(low = -3, high=9, size=20)})
df.head()
name num1 num2
0 d 34 7
1 b 49 6
2 a 51 -1
3 d 79 8
4 e 72 5
def num1_greater_than_60(number_num1):
if number_num1 >= 60:
return 'greater'
else:
return 'less'
df.groupby(df['num1'].apply(num1_greater_than_60))
答案 0 :(得分:4)
来自DataFrame.groupby()docs:
by : mapping, function, str, or iterable
Used to determine the groups for the groupby.
If ``by`` is a function, it's called on each value of the object's
index. If a dict or Series is passed, the Series or dict VALUES
will be used to determine the groups (the Series' values are first
aligned; see ``.align()`` method). If an ndarray is passed, the
values are used as-is determine the groups. A str or list of strs
may be passed to group by the columns in ``self``
所以我们可以这样做:
In [35]: df.set_index('num1').groupby(num1_greater_than_60)[['name']].count()
Out[35]:
name
greater 15
less 5
答案 1 :(得分:2)
你可以不在这里申请
df.groupby(df.num1.gt(60))
df.num1.gt(60)
Out[774]:
0 True
1 True
2 True
3 True
4 False
5 True
6 True
7 True
8 False
9 True
10 False
11 True
12 True
13 True
14 False
15 True
16 False
17 False
18 True
19 False
Name: num1, dtype: bool
答案 2 :(得分:1)
一般情况下,我会通过创建派生列然后groupby来实现这一点 - 我发现这更容易跟踪,并且总是可以删除它或仅选择最后需要的列。
df = pd.DataFrame({'name' : np.random.choice(['a','b','c','d','e'], 20),
'num1': np.random.randint(low = 30, high=100, size=20),
'num2': np.random.randint(low = -3, high=9, size=20)})
df['num1_greater_than_60'] = df['num1'].gt(60).replace(
to_replace=[True, False],
value=['greater', 'less'])
df.groupby('num1_greater_than_60').dosomething()