我有以下数据框:
bin_class = [0,1,1,1,0,0,0,0,1,1,0,0,0,0,0,0,0,1]
teams = ['A','B','B','A','A','B','B','A','A','B','B','A','A','B','B','A','B','B']
d = {'Team':teams,'Classification':bin_class}
df = pd.DataFrame(d)
Team Classification
0 A 0
1 B 1
2 B 1
3 A 1
4 A 0
5 B 0
6 B 0
7 A 0
8 A 1
9 B 1
10 B 0
11 A 0
12 A 0
13 B 0
14 B 0
15 A 0
16 B 0
17 B 1
我需要弄清楚每个团队每个bin_class的百分比。即,在团队A的所有行中,0%和1%是多少?我尝试了几种失败了并且过于复杂的方法,有没有简单的方法可以做到这一点?
答案 0 :(得分:6)
使用crosstab
pd.crosstab(df.Team,df.Classification,normalize='index')
Out[498]:
Classification 0 1
Team
A 0.75 0.25
B 0.60 0.40
答案 1 :(得分:2)
1的百分比只是Classification
的平均值,因为这里只有0
和1
:
>>> df.groupby('Team').mean()
Classification
Team
A 0.25
B 0.40
请注意,如果Classification
列的值不是0
和1
,这将 无效。
答案 2 :(得分:1)
您可以在“团队”和“分类”列上进行groupby
,然后计算百分比:
group_count = df.groupby(['Team', 'Classification']).agg(np.size)
Output:
Team Classification
A 0 12
1 4
B 0 12
1 8
group_percentage = group_count.groupby(level=0).apply(lambda x: 100 * x / float(x.sum()))
Output:
Team Classification
A 0 75.0
1 25.0
B 0 60.0
1 40.0
答案 3 :(得分:0)
ones=df.groupby(['Team']).sum()
long=df.groupby(['Team']).count()
percentages_ones=(ones/long)*100
percentages_zeros=((long-ones)/long)*100
percentages_ones.rename(columns=lambda x: x.replace('Classification', 'Percentage of ones'), inplace=True)
percentages_zeros.rename(columns=lambda x: x.replace('Classification', 'Percentages of zeros'), inplace=True)
percentages=pd.concat([percentages_zeros,percentages_ones],axis=1)
percentages
输出:
Percentages of zeros Percentage of ones
Team
A 75.0 25.0
B 60.0 40.0