计算类别在熊猫中的百分比

时间:2018-09-14 03:47:07

标签: python pandas series

我有一个数据帧train,并且已经从train数据帧中过滤出一定数量的行以形成promoted数据帧:

print(train.department.value_counts(),'\n')
promoted=train[train.is_promoted==1]
print(promoted.department.value_counts())

以上代码的输出为:

Sales & Marketing    16840
Operations           11348
Technology            7138
Procurement           7138
Analytics             5352
Finance               2536
HR                    2418
Legal                 1039
R&D                    999
Name: department, dtype: int64

Sales & Marketing    1213
Operations           1023
Technology            768
Procurement           688
Analytics             512
Finance               206
HR                    136
R&D                    69
Legal                  53
Name: department, dtype: int64

我想显示在train数据框中从promoted出现了多少列部门部门的百分比,即不是数字1213、1023、768,688等。我应该得到一个百分比,例如:1213/16840 * 100 = 7.2,等等。请注意,我不需要标准化值。

3 个答案:

答案 0 :(得分:1)

尝试:

promoted.department.value_counts()/train.department.value_counts()*100

它应该给您所需的输出:

Sales & Marketing    7.2030
Operations           9.0148
Technology          10.7593 
.....                 ...
Name: department, dtype: int64

答案 1 :(得分:0)

import pandas as pd
df = pd.read_csv("/home/spaceman/my_work/Most-Recent-Cohorts-Scorecard-Elements.csv")
df=df[['STABBR']] #each values is appearing in dataframe with multiple 
#after that i got  
CA    717
TX    454
NY    454
FL    417
PA    382
OH    320
IL    280
MI    189
NC    189
.........
.........

print df['STABBR'].value_counts(normalize=True) #returns the relative 
frequency by dividing all values by the sum of values
CA    0.099930
TX    0.063275
NY    0.063275
FL    0.058118
PA    0.053240
OH    0.044599
IL    0.039024
MI    0.026341
NC    0.026341
..............
..............

答案 2 :(得分:0)

在以下位置找到了更好的答案:https://stackoverflow.com/a/50558594/4106458

建议对value_counts()方法使用normalize = True命名参数

对于您的情况,代码为:

promoted.department.value_counts(normalize=True) * 100