我有一个df,其中groupby看起来像这样
+----------------+----------------+-------------+
| Team | Method | Count |
+----------------+----------------+-------------+
| Team 1 | Manual | 14 |
| Team 2 | Automated | 5 |
| Team 2 | Hybrid | 1 |
| Team 2 | Manual | 25 |
| Team 4 | Automated | 1 |
| Team 4 | Hybrid | 13 |
+----------------+----------------+-------------+
我想创建一个计数或分组来显示只有手动方法的团队。知道如何做到这一点吗?
对于这个数据集,答案是Team 1,因为他们是唯一一个只有手动方法的团队。
答案 0 :(得分:1)
您可以groupby
使用apply
all
获取所有值Manual
,并使用loc
和values
的子集:
print df
Team Method
0 Team 1 Manual
1 Team 1 Manual
2 Team 1 Manual
3 Team 1 Manual
4 Team 1 Manual
5 Team 2 Automated
6 Team 2 Automated
7 Team 2 Automated
8 Team 2 Automated
9 Team 2 Automated
10 Team 2 Hybrid
11 Team 2 Manual
12 Team 2 Manual
13 Team 3 Manual
14 Team 2 Manual
15 Team 2 Manual
16 Team 4 Automated
17 Team 4 Hybrid
g = df.groupby("Team")['Method'].apply( lambda x: (x == 'Manual').all())
print g
Team
Team 1 True
Team 2 False
Team 3 True
Team 4 False
Name: Method, dtype: bool
print g[g.values].index
Index([u'Team 1', u'Team 3'], dtype='object', name=u'Team')
print df.loc[df['Team'].isin(g[g.values].index)]
Team Method
0 Team 1 Manual
1 Team 1 Manual
2 Team 1 Manual
3 Team 1 Manual
4 Team 1 Manual
13 Team 3 Manual
为了更好地理解apply
,您可以将自定义函数f
与print
一起使用,将每个项目组与字符串Manual
进行比较:
def f(x):
print(x == 'Manual')
print df.groupby("Team")['Method'].apply(f)
0 True
1 True
2 True
3 True
4 True
Name: Team 1, dtype: bool
5 False
6 False
7 False
8 False
9 False
10 False
11 True
12 True
13 True
14 True
Name: Team 2, dtype: bool
15 False
16 False
Name: Team 4, dtype: bool
但是我们需要检查所有值是否为字符串Manual
- 这意味着我们需要检查所有值是True
all
:
def f(x):
print(x == 'Manual').all()
print df.groupby("Team")['Method'].apply(f)
True
False
False
编辑:我添加了带{2}的groupby
示例:
print df
Col1 Method Col2
0 Team 1 Manual Team
1 Team 1 Manual Team
2 Team 1 Manual Team
3 Team 1 Manual Team
4 Team 1 Manual Team
5 Team 2 Automated Team
6 Team 2 Automated Team
7 Team 2 Automated Team
8 Team 2 Automated Team
9 Team 2 Automated Team
10 Team 2 Hybrid Team
11 Team 2 Manual Team
12 Team 2 Manual Team
13 Team 3 Manual Team
14 Team 2 Manual Team
15 Team 2 Manual Team
16 Team 4 Automated Team
17 Team 4 Hybrid Team
g = df.groupby(["Col1", "Col2"])['Method'].apply(lambda x: (x == 'Manual').all())
print g
Col1 Col2
Team 1 Team True
Team 2 Team False
Team 3 Team True
Team 4 Team False
Name: Method, dtype: bool
g = g.reset_index()
print g
Col1 Col2 Method
0 Team 1 Team True
1 Team 2 Team False
2 Team 3 Team True
3 Team 4 Team False
g1 = g.loc[g['Method'], 'Col1']
print g1
0 Team 1
2 Team 3
Name: Col1, dtype: object
print df.loc[df['Col1'].isin(g1.values)]
Col1 Method Col2
0 Team 1 Manual Team
1 Team 1 Manual Team
2 Team 1 Manual Team
3 Team 1 Manual Team
4 Team 1 Manual Team
13 Team 3 Manual Team