计算按其他列的唯一值分组的唯一值百分比

时间:2019-08-22 15:20:51

标签: python pandas pandas-groupby

我有以下数据框:

bin_class = [0,1,1,1,0,0,0,0,1,1,0,0,0,0,0,0,0,1]
teams = ['A','B','B','A','A','B','B','A','A','B','B','A','A','B','B','A','B','B']
d = {'Team':teams,'Classification':bin_class}
df = pd.DataFrame(d)

Team    Classification
0   A   0
1   B   1
2   B   1
3   A   1
4   A   0
5   B   0
6   B   0
7   A   0
8   A   1
9   B   1
10  B   0
11  A   0
12  A   0
13  B   0
14  B   0
15  A   0
16  B   0
17  B   1

我需要弄清楚每个团队每个bin_class的百分比。即,在团队A的所有行中,0%和1%是多少?我尝试了几种失败了并且过于复杂的方法,有没有简单的方法可以做到这一点?

4 个答案:

答案 0 :(得分:6)

使用crosstab

pd.crosstab(df.Team,df.Classification,normalize='index')
Out[498]: 
Classification     0     1
Team                      
A               0.75  0.25
B               0.60  0.40

答案 1 :(得分:2)

1的百分比只是Classification平均值,因为这里只有01

>>> df.groupby('Team').mean()
      Classification
Team                
A               0.25
B               0.40

请注意,如果Classification列的值不是01,这将 无效。

答案 2 :(得分:1)

您可以在“团队”和“分类”列上进行groupby,然后计算百分比:

group_count = df.groupby(['Team', 'Classification']).agg(np.size)

Output:
Team  Classification
A     0                 12
      1                  4
B     0                 12
      1                  8


group_percentage = group_count.groupby(level=0).apply(lambda x: 100 * x / float(x.sum()))

Output:
Team  Classification
A     0                 75.0
      1                 25.0
B     0                 60.0
      1                 40.0

答案 3 :(得分:0)

ones=df.groupby(['Team']).sum()
long=df.groupby(['Team']).count()
percentages_ones=(ones/long)*100
percentages_zeros=((long-ones)/long)*100
percentages_ones.rename(columns=lambda x: x.replace('Classification', 'Percentage of ones'), inplace=True)
percentages_zeros.rename(columns=lambda x: x.replace('Classification', 'Percentages of zeros'), inplace=True)
percentages=pd.concat([percentages_zeros,percentages_ones],axis=1)
percentages
  

输出:

     Percentages of zeros   Percentage of ones
Team        
A    75.0                   25.0
B    60.0                   40.0