我有一个数据帧train
,并且已经从train
数据帧中过滤出一定数量的行以形成promoted
数据帧:
print(train.department.value_counts(),'\n')
promoted=train[train.is_promoted==1]
print(promoted.department.value_counts())
以上代码的输出为:
Sales & Marketing 16840
Operations 11348
Technology 7138
Procurement 7138
Analytics 5352
Finance 2536
HR 2418
Legal 1039
R&D 999
Name: department, dtype: int64
Sales & Marketing 1213
Operations 1023
Technology 768
Procurement 688
Analytics 512
Finance 206
HR 136
R&D 69
Legal 53
Name: department, dtype: int64
我想显示在train
数据框中从promoted
出现了多少列部门部门的百分比,即不是数字1213、1023、768,688等。我应该得到一个百分比,例如:1213/16840 * 100 = 7.2,等等。请注意,我不需要标准化值。
答案 0 :(得分:1)
尝试:
promoted.department.value_counts()/train.department.value_counts()*100
它应该给您所需的输出:
Sales & Marketing 7.2030
Operations 9.0148
Technology 10.7593
..... ...
Name: department, dtype: int64
答案 1 :(得分:0)
import pandas as pd
df = pd.read_csv("/home/spaceman/my_work/Most-Recent-Cohorts-Scorecard-Elements.csv")
df=df[['STABBR']] #each values is appearing in dataframe with multiple
#after that i got
CA 717
TX 454
NY 454
FL 417
PA 382
OH 320
IL 280
MI 189
NC 189
.........
.........
print df['STABBR'].value_counts(normalize=True) #returns the relative
frequency by dividing all values by the sum of values
CA 0.099930
TX 0.063275
NY 0.063275
FL 0.058118
PA 0.053240
OH 0.044599
IL 0.039024
MI 0.026341
NC 0.026341
..............
..............
答案 2 :(得分:0)
在以下位置找到了更好的答案:https://stackoverflow.com/a/50558594/4106458
建议对value_counts()方法使用normalize = True命名参数
对于您的情况,代码为:
promoted.department.value_counts(normalize=True) * 100