如何根据值计数进行分组或排除?

时间:2016-02-01 15:52:05

标签: python pandas count

我有一个df,其中groupby看起来像这样

   +----------------+----------------+-------------+
   | Team           | Method         |  Count      |
   +----------------+----------------+-------------+
   | Team 1         | Manual         |          14 |
   | Team 2         | Automated      |           5 |
   | Team 2         | Hybrid         |           1 |
   | Team 2         | Manual         |          25 |
   | Team 4         | Automated      |           1 |
   | Team 4         | Hybrid         |          13 |
   +----------------+----------------+-------------+

我想创建一个计数或分组来显示只有手动方法的团队。知道如何做到这一点吗?

对于这个数据集,答案是Team 1,因为他们是唯一一个只有手动方法的团队。

1 个答案:

答案 0 :(得分:1)

您可以groupby使用apply all获取所有值Manual,并使用locvalues的子集:

print df
      Team     Method
0   Team 1     Manual
1   Team 1     Manual
2   Team 1     Manual
3   Team 1     Manual
4   Team 1     Manual
5   Team 2  Automated
6   Team 2  Automated
7   Team 2  Automated
8   Team 2  Automated
9   Team 2  Automated
10  Team 2     Hybrid
11  Team 2     Manual
12  Team 2     Manual
13  Team 3     Manual
14  Team 2     Manual
15  Team 2     Manual
16  Team 4  Automated
17  Team 4     Hybrid
g = df.groupby("Team")['Method'].apply( lambda x: (x == 'Manual').all())
print g
Team
Team 1     True
Team 2    False
Team 3     True
Team 4    False
Name: Method, dtype: bool

print g[g.values].index
Index([u'Team 1', u'Team 3'], dtype='object', name=u'Team')

print df.loc[df['Team'].isin(g[g.values].index)]
      Team  Method
0   Team 1  Manual
1   Team 1  Manual
2   Team 1  Manual
3   Team 1  Manual
4   Team 1  Manual
13  Team 3  Manual

为了更好地理解apply,您可以将自定义函数fprint一起使用,将每个项目组与字符串Manual进行比较:

def f(x):
    print(x == 'Manual')

print df.groupby("Team")['Method'].apply(f)
0    True
1    True
2    True
3    True
4    True
Name: Team 1, dtype: bool
5     False
6     False
7     False
8     False
9     False
10    False
11     True
12     True
13     True
14     True
Name: Team 2, dtype: bool
15    False
16    False
Name: Team 4, dtype: bool

但是我们需要检查所有值是否为字符串Manual - 这意味着我们需要检查所有值是True all

def f(x):
    print(x == 'Manual').all()

print df.groupby("Team")['Method'].apply(f)
True
False
False

编辑:我添加了带{2}的groupby示例:

print df
      Col1     Method  Col2
0   Team 1     Manual  Team
1   Team 1     Manual  Team
2   Team 1     Manual  Team
3   Team 1     Manual  Team
4   Team 1     Manual  Team
5   Team 2  Automated  Team
6   Team 2  Automated  Team
7   Team 2  Automated  Team
8   Team 2  Automated  Team
9   Team 2  Automated  Team
10  Team 2     Hybrid  Team
11  Team 2     Manual  Team
12  Team 2     Manual  Team
13  Team 3     Manual  Team
14  Team 2     Manual  Team
15  Team 2     Manual  Team
16  Team 4  Automated  Team
17  Team 4     Hybrid  Team
g = df.groupby(["Col1", "Col2"])['Method'].apply(lambda x: (x == 'Manual').all())
print g
Col1    Col2
Team 1  Team     True
Team 2  Team    False
Team 3  Team     True
Team 4  Team    False
Name: Method, dtype: bool

g =  g.reset_index()
print g
     Col1  Col2 Method
0  Team 1  Team   True
1  Team 2  Team  False
2  Team 3  Team   True
3  Team 4  Team  False

g1 = g.loc[g['Method'], 'Col1']
print g1
0    Team 1
2    Team 3
Name: Col1, dtype: object

print df.loc[df['Col1'].isin(g1.values)]
      Col1  Method  Col2
0   Team 1  Manual  Team
1   Team 1  Manual  Team
2   Team 1  Manual  Team
3   Team 1  Manual  Team
4   Team 1  Manual  Team
13  Team 3  Manual  Team