我正在尝试计算不同模型准确性的置信区间,但是当我尝试运行时,会得到Nan值和均值的重印。我不明白为什么。我检查了对象的值,它们是浮动的。
有人可以向我解释我的代码有什么问题吗?
我不认为其他问题可以回答我的问题。我已经有一个问题在计算平均值(它不计算平均值,但只是复制值),我不想填写NaaN值。我想计算标准偏差和置信区间。
import pandas as pd
import numpy as np
import math
df=pd.DataFrame({'Classifiers': ['Multinomial NB','KNN','Random Forest','Stochastic Gradient Descent','Dummy Baseline','Decision Tree'],
'Accuracy': [0.8262877442273535,1.0,0.5396092362344582,0.9996447602131439,0.5126110124333926,1.0] },
columns=['Classifiers', 'Accuracy'])
print(df)
print('-'*30)
stats = df.groupby(['Classifiers'])['Accuracy'].agg(['mean', 'count', 'std'])
print(stats)
print('-'*30)
ci95_hi = []
ci95_lo = []
for i in stats.index:
m, c, s = stats.loc[i]
ci95_hi.append(m + 1.96*s/math.sqrt(c))
ci95_lo.append(m - 1.96*s/math.sqrt(c))
stats['ci95_hi'] = ci95_hi
stats['ci95_lo'] = ci95_lo
print(stats)
Classifiers Accuracy
0 Multinomial NB 0.826288
1 KNN 1.000000
2 Random Forest 0.539609
3 Stochastic Gradient Descent 0.999645
4 Dummy Baseline 0.512611
5 Decision Tree 1.000000
------------------------------
mean count std
Classifiers
Decision Tree 1.000000 1 NaN
Dummy Baseline 0.512611 1 NaN
KNN 1.000000 1 NaN
Multinomial NB 0.826288 1 NaN
Random Forest 0.539609 1 NaN
Stochastic Gradient Descent 0.999645 1 NaN
mean count std ci95_hi ci95_lo
Classifiers
Decision Tree 1.000000 1 NaN NaN NaN
Dummy Baseline 0.512611 1 NaN NaN NaN
KNN 1.000000 1 NaN NaN NaN
Multinomial NB 0.826288 1 NaN NaN NaN
Random Forest 0.539609 1 NaN NaN NaN
Stochastic Gradient Descent 0.999645 1 NaN NaN NaN
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 2 columns):
Classifiers 6 non-null object
Accuracy 6 non-null float64
dtypes: float64(1), object(1)
memory usage: 176.0+ bytes