精度误差的置信区间

时间:2019-05-19 06:59:29

标签: python pandas dataframe confidence-interval

我正在尝试计算不同模型准确性的置信区间,但是当我尝试运行时,会得到Nan值和均值的重印。我不明白为什么。我检查了对象的值,它们是浮动的。

有人可以向我解释我的代码有什么问题吗?

我不认为其他问题可以回答我的问题。我已经有一个问题在计算平均值(它不计算平均值,但只是复制值),我不想填写NaaN值。我想计算标准偏差和置信区间。


import pandas as pd
import numpy as np
import math

df=pd.DataFrame({'Classifiers': ['Multinomial NB','KNN','Random Forest','Stochastic Gradient Descent','Dummy Baseline','Decision Tree'], 
                 'Accuracy': [0.8262877442273535,1.0,0.5396092362344582,0.9996447602131439,0.5126110124333926,1.0] },
                 columns=['Classifiers', 'Accuracy'])
print(df)
print('-'*30)

stats = df.groupby(['Classifiers'])['Accuracy'].agg(['mean', 'count', 'std'])
print(stats)
print('-'*30)

ci95_hi = []
ci95_lo = []

for i in stats.index:
    m, c, s = stats.loc[i]
    ci95_hi.append(m + 1.96*s/math.sqrt(c))
    ci95_lo.append(m - 1.96*s/math.sqrt(c))

stats['ci95_hi'] = ci95_hi
stats['ci95_lo'] = ci95_lo
print(stats)

                   Classifiers  Accuracy
0               Multinomial NB  0.826288
1                          KNN  1.000000
2                Random Forest  0.539609
3  Stochastic Gradient Descent  0.999645
4               Dummy Baseline  0.512611
5                Decision Tree  1.000000
------------------------------
                                 mean  count  std
Classifiers                                      
Decision Tree                1.000000      1  NaN
Dummy Baseline               0.512611      1  NaN
KNN                          1.000000      1  NaN
Multinomial NB               0.826288      1  NaN
Random Forest                0.539609      1  NaN
Stochastic Gradient Descent  0.999645      1  NaN
                                 mean  count  std  ci95_hi  ci95_lo
Classifiers                                                        
Decision Tree                1.000000      1  NaN      NaN      NaN
Dummy Baseline               0.512611      1  NaN      NaN      NaN
KNN                          1.000000      1  NaN      NaN      NaN
Multinomial NB               0.826288      1  NaN      NaN      NaN
Random Forest                0.539609      1  NaN      NaN      NaN
Stochastic Gradient Descent  0.999645      1  NaN      NaN      NaN


df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 2 columns):
Classifiers    6 non-null object
Accuracy       6 non-null float64
dtypes: float64(1), object(1)
memory usage: 176.0+ bytes

0 个答案:

没有答案