df = pd.DataFrame({'Credit Scores':[695 ,704, 718], 'Delinquent': [True, False, True]})
df.head()
初学者使用pandas dataFrames
因此,我创建了一个dataFrame,其中包含贷款开始时借款人的信用评分以及贷款是否拖欠。我希望通过将分数四舍五入到10左右(即“信用分数”:700、710、720等)来将分数分组在一起,然后找出每组分数中拖欠分数的百分比。示例输出可能看起来像这样。
Credit Score Delinquency Rate
0 700 .43
1 710 .45
2 720 .41
我不确定如何执行此操作,将不胜感激一些指导。谢谢
我遇到另一个问题
#round and group credit scores by 10's
df['Credit Score'] = round(df['Credit Score'], -1)
# group by credit score and sum the bool values divided by the size of each group
to_rate = df.groupby(round(df['Credit Score'], -1))['Delinquency Rate']
df['Delinquency Rate'] = to_rate.transform(sum) / to_rate.transform('size')
df.sort_values('Credit Score')
因此,当我对值进行排序和显示时,我注意到信用评分一直在重复。我似乎没有对它们进行正确分组...
Credit Score Delinquency Rate
54 450 1.0
17 470 0.0
28 470 0.0
10 480 0.5
59 480 0.5
该如何解决?还可以显示除小数位以外的其他内容吗?
我删除了这段代码以避免标准化,但是现在它没有舍入值。
df['Credit Score'] = round(df['Credit Score'], -1)
删除上方的行后的新输出
df.sort_values('Credit Score')
Credit Score Delinquency Rate
54 447 1.0
28 471 0.0
17 474 0.0
21 475 0.5
10 476 0.5
... ... ...
16 839 0.0
28 839 0.0
45 839 0.0
65 839 0.0
62 839 0.0
答案 0 :(得分:2)
IIUC
# new sample df
df = pd.DataFrame({'Credit Scores':[654 ,738, 863, 649, 650],
'Delinquent': [True, False, True, True, False]})
# use round with -1 to round to the nearest 10
df['Credit Scores'] = round(df['Credit Scores'], -1)
# group by credit score and get the mean
s = df.groupby('Credit Scores')['Delinquent'].mean()
s.reset_index().plot(kind='scatter', x='Credit Scores', y='Delinquent')
Delinquent
Credit Scores
650 0.666667
740 0.000000
860 1.000000