我不知道我是否可以清楚地问这个问题,但在这里我试试!
我有一个分类问题,我必须根据他的收入组来预测一个人的信用评分。我使用过这段代码:
for /f "tokens=1-3 delims=,D" %i in ('type dateidx.csv ^| find /i "%date:~10,4%%date:~4,2%%date:~7,2%"') do (set dateidxnum=%j)
set /a days12ago=%dateidxnum%-12
set /a days6ago=%dateidxnum%-6
for /f "tokens=1 delims=," %m in ('type dateidx.csv ^| find "D%days12ago%"') DO (for /f "tokens=1 delims=," %q in ('type dateidx.csv ^| find "D%days6ago%"') DO (echo %m-%q.xlsx))
现在我像往常一样有一个数据表,如下所示:
dta.groupby(['income_bracket'])['credit_score'].get_values()
说明:上面的数据表明,例如,一个人“非常低”的人。 0.0信用评分的收入等级为2340,信用评分1.0为456。
现在,有什么方法可以做一些事情:如果一个人在income_bracket中,那么预测他的credit_score将是MAX(该收入范围内的信用评分)?例如,如果某人的收入等级为“高”,那么我可以预测他的credit_score将为MAX(54,657)= 657 = 1.0
我想要的所需输出:newdata - > income_group =' high' ---> credit_score = 1(因为我知道在高收入群体中MAX值是657,属于1.0的信用评分
请帮助我实现这一目标。
答案 0 :(得分:1)
您需要idxmax
获取index
每组的val
值,#dta.reset_index(inplace=True)
#dta = dta.reset_index().rename(columns={0: 'val'})
print (dta)
income_bracket credit_score val
0 very low 0.0 2340
1 very low 1.0 456
2 moderate 0.0 1234
3 moderate 1.0 657
4 high 0.0 54
5 high 1.0 657
6 very high 0.0 9
7 very high 1.0 1234
中的最大值,然后按ix
选择这些行:
print (dta.groupby(['income_bracket'], sort=False)['val'].idxmax())
income_bracket
very low 0
moderate 2
high 5
very high 7
Name: val, dtype: int64
#select all columns
print (dta.ix[dta.groupby(['income_bracket'], sort=False)['val'].idxmax()])
income_bracket credit_score val
0 very low 0.0 2340
2 moderate 0.0 1234
5 high 1.0 657
7 very high 1.0 1234
#select columns income_bracket and credit_score
print (dta.ix[dta.groupby(['income_bracket'], sort=False)['val'].idxmax(),
['income_bracket','credit_score']])
income_bracket credit_score
0 very low 0.0
2 moderate 0.0
5 high 1.0
7 very high 1.0
#select column credit_score
print (dta.ix[dta.groupby(['income_bracket'], sort=False)['val'].idxmax(), 'credit_score'])
0 0.0
2 0.0
5 1.0
7 1.0
Name: credit_score, dtype: float64
ul.super-child:last-child, ul.super-child:nth-last-child(2) {
right: 0px;
}