如何为我的预测获得较低和较高的95%置信度或预测间隔列?
df1 = pd.DataFrame({
'cumsum_days': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],
'prediction': [800, 900, 1200, 700, 600,
550, 500, 650, 625, 600,
550, 525, 500, 400, 350]})
所需的数据框如下所示:
prediction lower_ci high_ci
800 some_num some num
900 some_num some num
1200 some_num some num
700 some_num some num
这些功能只给我个位数,但是我正在寻找df.prediction(每个15个数据点)的95%置信区间。
mean = df.prediction.mean()
std = df.prediction.std()
我也尝试过此操作(如下),但是它只给我三个值,而不是我的预测值的2个额外的置信带/间隔数组:
import numpy as np
import scipy.stats
def mean_confidence_interval(data, confidence=0.95):
a = 1.0 * np.array(data)
n = len(a)
m, se = np.mean(a), scipy.stats.sem(a)
h = se * scipy.stats.t.ppf((1 + confidence) / 2., n-1)
return m, m-h, m+h
答案 0 :(得分:0)
这样的事情怎么样?
bins = [0, 1, 1.25, 1.5, 1.75, 2, 2.25, 2.5, 2.75, 3, 3.25, 3.5, 3.75, 4, 4.25, 4.5, 4.75, 5, 5.25, 5.5, 5.75, 6, 6.25, 6.5, 6.75, 7, 7.25, 7.5, 7.75, 8, 8.25, 8.5, 8.75, 9, 9.25, 9.5, 9.75, 10, np.inf]
labels = ['0', '1', '1.25', '1.5', '1.75', '2', '2.25', '2.5', '2.75', '3', '3.25', '3.5', '3.75', '4', '4.25', '4.5', '4.75', '5', '5.25', '5.5', '5.75', '6', '6.25', '6.5', '6.75', '7', '7.25', '7.5', '7.75', '8', '8.25', '8.5', '8.75', '9', '9.25', '9.5', '9.75', '10']
dataset['RatingScore'] = pd.cut(dataset['Rating'], bins=bins, labels=labels, right=True)
您可以创建基本设置,然后将最终对象转换为数据框。