我正在阅读两天的校准方法,但实际上并没有说它是如何工作的。有两种类型的校准;
Platt缩放 - 预测空间分为二进制位和&对于每个区域,平均预测值对阳性病例的真实分数作图
等渗回归 - 数学上它试图通过二次规划拟合加权最小二乘,但相对于先前的观察,下一次观察总是不减少。
我已根据逻辑回归编写了一个基于逻辑回归的python模块(虽然我知道LogisticRegression
默认返回校准良好的预测,因为它直接优化了对数丢失,我构建它以检查我的理解)
import numpy as np
import pandas as pd
from sklearn import linear_model
from sklearn.calibration import CalibratedClassifierCV
from sklearn.model_selection import train_test_split
from sklearn.metrics import log_loss
from pandas import DataFrame
class logistic_Calibration:
def __init__(self, data, response):
self.data = data
self.response = response
def Calibration(self):
Xtrain, Xtest, ytrain, ytest = train_test_split(self.data, self.response, test_size=0.20, random_state=36)
logreg = linear_model.LogisticRegression()
logreg.fit(Xtrain, np.array(ytrain).flatten())
PredWO_calibration = logreg.predict_proba(Xtest)
lossWO_calibration = log_loss(ytest, PredWO_calibration)
clf_sigmoid = CalibratedClassifierCV(logreg, cv=5, method='sigmoid')
clf_sigmoid.fit(Xtrain, np.array(ytrain).flatten())
PredWITH_calibration = clf_sigmoid.predict_proba(Xtest)
lossWITH_calibration = log_loss(ytest, PredWITH_calibration)
Loss_difference_WO_minus_W = lossWO_calibration - lossWITH_calibration
return [lossWO_calibration, lossWITH_calibration, Loss_difference_WO_minus_W]
但我仍然不清楚以下几个部分,
请指导。