重采样后调整预测概率

时间:2019-03-07 07:11:14

标签: python python-3.x machine-learning logistic-regression

假设我的数据高度不平衡,我想训练一个用于二进制分类的模型。因此,我对少数群体进行了升采样,对少数群体进行了降采样,或者进行其他任何操作。我的问题是训练模型后获得的预测概率是否需要调整,因为我对训练数据进行了重新采样。

为具体起见,假设我使用逻辑回归,并假设y = 1是少数类。在这里,我正在上采样。

print('Number of class 0 samples before:', X_imb[y_imb == 0].shape[0])
print('Number of class 1 samples before:', X_imb[y_imb == 1].shape[0])

# bootstrapping
X_upsampled, y_upsampled = resample(X_imb[y_imb == 1],
                                    y_imb[y_imb == 1],
                                    replace=True,
                                    n_samples=X_imb[y_imb == 1].shape[0]*5,
                                    random_state=123)

print('Number of class 1 samples after:', X_upsampled.shape[0])

X_bal = np.vstack((X_imb[y_imb == 0], X_upsampled))
y_bal = np.hstack((y_imb[y_imb == 0], y_upsampled))

original_rate = X_imb[y_imb == 1].shape[0] / X_imb.shape[0]

rate_after_upsampling = X_bal[y_bal == 1].shape[0] / X_bal.shape[0]

这是必要的吗?

adjust = log((1-original_rate)/original_rate * (rate_after_upsampling)/(1-rate_after_upsampling))
# adjust == log(5)

predicted_y_adjusted = np.array([1/(1+exp(-j)) for j in X_bal.dot(log_reg.coef_[0])+log_reg.intercept_[0]-adjust])

谢谢。

0 个答案:

没有答案