对测试数据的Logistic回归模型预测为负且不止一个。概率范围为[0,1]。
我已经使用标准缩放器缩放了数据(训练和测试),超过了使用PCA进行降维的大小。使用了分层5倍CV。在二元分类问题的Logistic回归模型上拟合Trainset,并生成预测。
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(pc_train, Y_train,
test_size=0.33, random_state=42)
```````````````````````````````````````Model part
kf = StratifiedKFold(n_splits=5,shuffle=True,random_state=seed)
pred_test_full =0
cv_score =[]
i=1
for train_index,test_index in kf.split(X,y):
print('{} of KFold {}'.format(i,kf.n_splits))
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
#model
lr = LogisticRegression(C=.3)
lr.fit(X_train,y_train)
score = roc_auc_score(y_test,lr.predict(X_test))
print('ROC AUC score:',score)
cv_score.append(score)
pred_test = lr.predict_proba(x_test)[::,1]
pred_test_full +=pred_test
i+=1
Expected pred_test_full predicted array over test data to be in the range of [0,1] but instead getting -
array([4.06222773e-03, 2.07307776e-05, 1.62214101e-03, ...,
5.92852765e-06, 2.46471149e-07, 6.01245496e-05])